Could not match network interface VMWare (again...)

I’m seeing what looks like a regression of #19623 and/or side effects of #27353 that results in provisioning for VMware to fail on one of our virtual networks. It looks like the change to address #27353 in 97e72f867 by @ezr-ondrej is the most proximate change leading to this issue. Backing out that commit against Foreman 2.0.3 allows provisioning to succeed.

I think I have an idea of what’s going on. I’m not sure if reverting 97e72f867 would break other/more things. I don’t think that’s the best fix, though I don’t currently have the knowledge of Foreman or Fog sufficient to fix it in what I think is the correct way.

First, an underlying detail about our environment which I think is critical. Each Vsphere dvportgroup object has a name, id, and _ref attribute. As far as I can tell from the issue reported in #27353, name isn’t a unique attribute. I think id and _ref are both intended to be unique with respect to themselves, but they are provably not unique with respect to each other. IE there cannot be two dvportgroups with id=dvportgroup-11, but there can be (and in our environment is) one dvportgroup with id=dvportgroup-11 and a different one that has _ref=dvportgroup-11. Specifically as captured from app/models/concerns/fog_extensions/vsphere/server.rb in select_nic()'s all_networks:

{
  :name=>"8-Network-dvPortGroup", 
  :accessible=>true, 
  :id=>"dvportgroup-111024", 
  :vlanid=>8, 
  :virtualswitch=>"10GB-dvSwitch", 
  :datacenter=>"Our DC", 
  :_ref=>"dvportgroup-19"
}

{
  :name=>"30-Network-dvPortGroup", 
  :accessible=>true, 
  :id=>"dvportgroup-19", 
  :vlanid=>30, 
  :virtualswitch=>"10GB-dvSwitch", 
  :datacenter=>"Our DC", 
  :_ref=>"dvportgroup-11"
}

It appears that the attribute passed into Fog from Foreman in attributes[:interfaces] to Fog::Vsphere::Compute::Real::create_vm() eventually gets passed to Fog::Vsphere::Compute::Real::get_raw_network() in lib/fog/vsphere/requests/compute/get_network.rb. The first parameter to that function, ref_or_name suggests the function expects either a _ref attribute or the name, but not an id.

The change in 97e72f867 switched from passing the name attribute to the id (not the _ref) attribute. Even with 97e72f867, the each/do in parse_networks() in app/models/compute_resources/foreman/model/vmware.rb still gets the target network since the name attribute matches and works for us. That change also causes parse_networks() to pass the id instead of the name as the return value in args[“interface_attributes”][“network”]. When that id eventually gets to get_raw_network(), it’s incorrectly matched against the _ref attribute, and it selects the wrong dvportgroup.

In our case, the search in parse_networks searches for “30-Network-dvPortGroup” and correctly resolves the dvpg with id=dvportgroup-19, _ref=dvportgroup-11. However because get_raw_network eventually receives the id it thinks is a ref, it incorrectly selects our “8-Network-dvPortGroup” (which is _ref=dvportgroup-19), and that’s game over.

At the point where parse_networks is called, _ref doesn’t appear to be available. The networks property in app/models/compute_resources/foreman/model/vmware.rb doesn’t have it, and I’m not sure how to get one at that point. I think app/models/concerns/fog_extensions/vsphere/server.rb is calling the Vsphere API directly (rather than calling through Fog) to get a list of networks for all_networks, but I don’t know how to emulate that approach in parse_networks.

It would seem that trading the matched id for a _ref would be the safest way to avoid any ambiguity on name while still matching on the correct attribute. Reverting the fix for #19623 at the same time would probably be necessary.

It appears this works on the rest of our networks because id and _ref are equal. I don’t know how this particular mis-match came about in our environment or how it can possibly be resolved in Vsphere. In any case, it seems objectively wrong to take the ‘id’ attribute and pass it to a library function that states it expects a ‘ref’ attribute.

Could anyone point me in a direction that might retrieve the _ref attribute within parse_networks to be able pass the correct attribute?

Best regards,
Zac Bedell

Hi,

thanks for breath takingly great debugging!
I belive it’s due to changes made to fog-vsphere with not so clear idea about the vsphere API (mainly my changes).

I belive in network apart of DistributedVirtualPortgroup the id == _ref, but for DVP no, thus it might get quite messy.

One solution would be to change the fetching by ref_or_name, as it tries to fetch by both key and _ref so if it first gets the network with matching key, it will win.

Or we can make fog to return id always == _ref like:


Do you think this could resolve the issue?

I don’t think I understand Fog well enough to understand that patch. I think the last bit of it as adding _ref as the id attribute in the return from list_networks? Assuming Foreman used that _ref returned as id and then eventually passed it back in as ref_or_name, I think that would do the trick and work for both of our use cases.

I see one of the verification steps for that patch failed. I’m not sure if I’m up for setting up enough of a build environment to build a local copy of that to test, but I might hack that into my local tree on our server and see how it goes.

I gave that patch a try, but it doesn’t look like it helped. It looks like usage of the dvPortGroup’s id (not _ref) is coming from further up the call chain in Foreman. At least in app/models/concerns/orchestration/compute.rb’s setCompute(), the interface_attributes[network] field contains the id, not the _ref. It looks like the id comes from merging attributes from the compute_resource object. That code isn’t VSphere specific, so I’m guessing the id attribute is what’s stored internally for the compute_resource / compute_profile.

If changing Fog::VSphere is an option, it looks like changing get_raw_network in lib/fog/vsphere/requests/compute/get_network.rb to search by id instead of _ref is the smallest change that would make this work. As far as I can tell, the only things in Foreman which eventually reach that function are already using either an id or a name, never a _ref. I have no idea what affect that might have on other clients of Fob::VSphere though?

I have no idea what it would take to move the usage of _ref instead of id up the chain of Foreman’s API & GUI. It doesn’t look trivial.

I did some more hacking on this, and I think I have a solution that works. I need to make changes to a clean copy of the code next week to make sure I didn’t accidentally change something I wasn’t expecting. Does this approach seem sound, @ezr-ondrej?

Modify fog-vsphere’s Fog::Vsphere::Compute::Network in lib/fog/vsphere/models/compute/network.rb to expose _ref. It’s already present in the underlying ActiveRecord, just not visible.

module Fog
  module Vsphere
    class Compute
      class Network < Fog::Model
        identity :id

        attribute :name
        attribute :datacenter
        attribute :accessible # reachable by at least one hypervisor
        attribute :virtualswitch
        attribute :vlanid
        attribute :_ref

        def to_s
          name
        end
      end
    end
  end
end

Then change app/models/compute_resources/foreman/model/vmware.rb to return that ref instead of the name or the id:

def parse_networks(args)
  args = args.deep_dup
  dc_networks = networks
  args["interfaces_attributes"]&.each do |key, interface|
    # Consolidate network to network id
    net = dc_networks.detect { |n| n.id == interface['network'] }
    net ||= dc_networks.detect { |n| n.name == interface['network'] }
    raise "Unknown Network ID: #{interface['network']}" if net.nil?
    interface["network"] = net._ref
    interface["virtualswitch"] = net.virtualswitch
  end
  args
end

As far as I can tell, that correctly creates the VM in the desired network based on passing the _ref into get_raw_network. When the later validation step occurs in app/models/concerns/fog_extensions/vsphere/server.rb, it appears everything in that section of the code is back to using id’s instead of _ref’s, but it still works out correctly.

I’d really love to see either _ref or id used exclusively across the board, but I don’t have enough understanding of why the bit in fog_extensions goes back to id’s. My hunch is the source of that id is going back to Foreman’s model of the network which is stored as id’s. I’m not sure if trying to introduce the same translation that occurs in parse_networks() makes sense at this point?

Any thoughts?

At any rate, I’ll get this packed up as a proper pull request next week.