I’m seeing what looks like a regression of #19623 and/or side effects of #27353 that results in provisioning for VMware to fail on one of our virtual networks. It looks like the change to address #27353 in 97e72f867 by @ezr-ondrej is the most proximate change leading to this issue. Backing out that commit against Foreman 2.0.3 allows provisioning to succeed.
I think I have an idea of what’s going on. I’m not sure if reverting 97e72f867 would break other/more things. I don’t think that’s the best fix, though I don’t currently have the knowledge of Foreman or Fog sufficient to fix it in what I think is the correct way.
First, an underlying detail about our environment which I think is critical. Each Vsphere dvportgroup object has a name, id, and _ref attribute. As far as I can tell from the issue reported in #27353, name isn’t a unique attribute. I think id and _ref are both intended to be unique with respect to themselves, but they are provably not unique with respect to each other. IE there cannot be two dvportgroups with id=dvportgroup-11, but there can be (and in our environment is) one dvportgroup with id=dvportgroup-11 and a different one that has _ref=dvportgroup-11. Specifically as captured from app/models/concerns/fog_extensions/vsphere/server.rb in select_nic()'s all_networks:
{
:name=>"8-Network-dvPortGroup",
:accessible=>true,
:id=>"dvportgroup-111024",
:vlanid=>8,
:virtualswitch=>"10GB-dvSwitch",
:datacenter=>"Our DC",
:_ref=>"dvportgroup-19"
}
{
:name=>"30-Network-dvPortGroup",
:accessible=>true,
:id=>"dvportgroup-19",
:vlanid=>30,
:virtualswitch=>"10GB-dvSwitch",
:datacenter=>"Our DC",
:_ref=>"dvportgroup-11"
}
It appears that the attribute passed into Fog from Foreman in attributes[:interfaces] to Fog::Vsphere::Compute::Real::create_vm() eventually gets passed to Fog::Vsphere::Compute::Real::get_raw_network() in lib/fog/vsphere/requests/compute/get_network.rb. The first parameter to that function, ref_or_name suggests the function expects either a _ref attribute or the name, but not an id.
The change in 97e72f867 switched from passing the name attribute to the id (not the _ref) attribute. Even with 97e72f867, the each/do in parse_networks() in app/models/compute_resources/foreman/model/vmware.rb still gets the target network since the name attribute matches and works for us. That change also causes parse_networks() to pass the id instead of the name as the return value in args[“interface_attributes”][“network”]. When that id eventually gets to get_raw_network(), it’s incorrectly matched against the _ref attribute, and it selects the wrong dvportgroup.
In our case, the search in parse_networks searches for “30-Network-dvPortGroup” and correctly resolves the dvpg with id=dvportgroup-19, _ref=dvportgroup-11. However because get_raw_network eventually receives the id it thinks is a ref, it incorrectly selects our “8-Network-dvPortGroup” (which is _ref=dvportgroup-19), and that’s game over.
At the point where parse_networks is called, _ref doesn’t appear to be available. The networks property in app/models/compute_resources/foreman/model/vmware.rb doesn’t have it, and I’m not sure how to get one at that point. I think app/models/concerns/fog_extensions/vsphere/server.rb is calling the Vsphere API directly (rather than calling through Fog) to get a list of networks for all_networks, but I don’t know how to emulate that approach in parse_networks.
It would seem that trading the matched id for a _ref would be the safest way to avoid any ambiguity on name while still matching on the correct attribute. Reverting the fix for #19623 at the same time would probably be necessary.
It appears this works on the rest of our networks because id and _ref are equal. I don’t know how this particular mis-match came about in our environment or how it can possibly be resolved in Vsphere. In any case, it seems objectively wrong to take the ‘id’ attribute and pass it to a library function that states it expects a ‘ref’ attribute.
Could anyone point me in a direction that might retrieve the _ref attribute within parse_networks to be able pass the correct attribute?
Best regards,
Zac Bedell