RFC: Drop parsing NIC interfaces

lzap · March 20, 2020, 3:12pm

Hello,

Foreman for a long time is capable of parsing puppet facts creating NIC interfaces in Foreman database. This works quite reliably for standard ethernet interfaces, things gets clunky when it comes to bonds, teams, bridges, vlans or aliases. Most of this comes from the fact how limited puppet (facter) is with reporting those, it’s getting better in facter version 3.x or newer but it’s still not ideal.

When I was working on another issue in regard to the Puppet NIC parser (Foreman trigger not needed network interface updates on puppet facts upload) and I researched the code we have, I realized this needs either huge overhaul or things must change.

I wonder, how useful is this feature. We do rely on the NIC parsing capability in discovery, however this is something we could change. I would like to hear opinions because we’ve seen many complaints about this. Most of these are unfortunate bugs which lead to performance issues, database lockups or overall degraded performance of deployments when (due to bug) NIC interfaces accumulate up to thousands or even tens of thousands per host.

There are several options I can think of:

Get rid of puppet NIC parsing completely. We’d need to change how discovery checks-in, but using puppet is probably an overkill - what 90% of our users need is cpu, memory, disks, serial number, IP and MAC addresses. That’s something we can easily report ourselves and this is not reinventing wheel - just few data sent over network.
Radically simplify NIC parsing code and only support regular ethernet devices. Most of the bugs we face has something to do with complex heuristic we have in our codebase in order to guess virtual interfaces (bonds, vlans, aliases, bridges) from various device naming schemes or bits of data facter provides. If we get rid of this and keep only basic functionality, this would dramatically improve maintenance from our side, performance and robustness.
Keep everything as is. Fix another bug in this area. Sure, if you report this is something you cannot live without, we would do this as well. It’s just one of many.

Thanks for opinions.

ekohl · March 20, 2020, 3:47pm

I’ve always had an issue with this being called Puppet Facts. The project is facter and Puppet is just the method of how it gets there. It’s essentially facter --json | curl https://foreman.example.com/api/hosts/facts.

To be clear, Puppet is not present on the discovery image. It’s calling facter directly.

Big on inventing yet another fact reporting tool. Now that Facter 4 is out (which is again a Ruby gem), we can easily upgrade the discovery image from version 2, skipping 3 altogether. Expanding facter-ng/lib/resolvers/networking_linux_resolver.rb at master · puppetlabs/facter-ng · GitHub with information about bonds and vlans is much easier than trying to build an entire tool.

AFAIK we’ve never tried to make sure Facter provides sufficient information that we don’t need heuristics. In particular for discovery and new hosts this can be useful.

With all of that said, I think we should also consider the case where various interfaces don’t need to be managed. Tools like RHV (formerly RHEV) manage interfaces themselves and when reprovisioned by Foreman they only need sufficient connectivity for RHV to reconfigure it. There are probably other examples. Right now there’s mostly the global on/off setting for updating interfaces. There’s also a blacklist. Not sure if we can provide some more granularity, like setting a host parameter to skip interface fact importing on that host.

bagasse · March 20, 2020, 3:59pm

As reporter of the issue you mentioned (thanks a lot to take a look at it !), I looked at the code before reporting the issue, and yes, it seems IMHO complicated code to maintain (and fix) as is. The option 2 is ok for our usecases, as we only care about standard NICs but we don’t care about any virtual interfaces (except bond interfaces in some cases) reported by puppet. It may, in some cases, like hypervisors with a lot of VLANs, be counter-productive to list all of them in foreman, because these ifaces are not managed by foreman nor puppet at all.

Marek_Hulan · March 23, 2020, 9:00am

I also think we shouldn’t be building our facter. We’ve already discussed this at RFC: Foreman facter

It’s not just puppet/facter NIC parsing, all facters parsers are capable of this. I think this option should be stop parsing NICs completely. The main motivation was for detecting interfaces on baremetal, so that you can easily configure bonds/vlans on complicated setups of networking (e.g. in openstack - tenant, management and other networks).

I like this, however how do we reliably detect them?

I think this caused more headache than it reduced in last years, probably the worst option.

the number of if conditions reflects there are many possibilities. We can see the same thing in our orchestration code. It’s not simple. If we try to refactor or rewrite this code, we’ll end up with new set of if conditions.

Is there a way to reliable detect or distinguish between them?

I think we already realized that Foreman shouldn’t be the network management tool. If there’s a way to only detect non-virtual interfaces, I’d drop virtual interfaces parsing. If not, I’d even consider dropping the whole parsing besides for primary interface.

lzap · March 23, 2020, 9:19am

I like this idea the most. This goes hand in hand with discovery which is only relevant for provision_interface (= primary). One thing that we could loose is parsing BMC interfaces tho.

We could do a combination:

for primary interface parse everything and check the result as it must be “finite” amount of interfaces
for all other interfaces only parse physical (ethernet) or BMCs (no virtual parsing)

This puts me into the worse position, I need to fix the bug regardless and on top of that do an extra change described above. I believe we need to vastly improve testing of this, I propose to move these tests outside of host_test huge test case file and prepare real-world JSON files from the following hosts both for puppet 2, 3 (and 4?):

regular nic
bond
team
bridge
vlan
bond with vlan

With a test suite for that so we are sure for once and forever this works fine. We need to test updates too since this is the problem in the report (amount of NICs is increasing with every request).

TimoGoebel · March 23, 2020, 2:08pm

I believe this is the key to solving this. I believe we definitely need to support physical interfaces, BMC and bond interfaces. And probably Bridges and VLANs as well.
As I’ve said many times before, I believe Foreman should be the tool to manage your on-prem datacenter. And that means bare metal. And that means VLANs, Bonds, BMC and physical interfaces. If we like it or not.
As a user you want to be able to deploy a host on a bonded network and configure the system once (in Foreman) and then let the provisioning do the magic. And you want the state visible in Foreman (for reports/inventory and to change the correct interface, …).

Aside from the fact that the code may need an overhaul, let’s talk about the workflows this change would affect. How should users use Foreman?

Let me know if you need some fact samples if there aren’t suitable ones in FacterDB.

ekohl · March 23, 2020, 2:18pm

I’ve been wondering about using facterdb in our tests, but it’s not really useful for providing variations as we would like to. However, it does have various scripts for spinning up a machine, capturing facts with various versions and storing that.

lzap · March 23, 2020, 2:24pm

Agreed, the question is why do you need those NICs to be created. I can imagine when you want to provision unmanaged hosts - that’s a time saver. Then, let’s keep parsing of primary interface (including bond, bridge, vlan). I think this is what we agree on, so I am working on getting those example JSONs out of facter 2, 3 and 4 in an automated way (currently working on a script that gets me this).

But do you think users really need parsed NICs for all secondary (mostly unmanaged) NICs? That’s the question I want to solve now.

lzap · March 23, 2020, 2:25pm

I am currently working on such script, I already have lot of good output let me finish that and share it.

ekohl · March 23, 2020, 2:40pm

I think determining what’s the primary is even harder than adding all the info to facts because Linux doesn’t really have a concept of a primary NIC, other than a default gateway. IMHO we either do fact importing right or not at all. If you prefer deciding on network configs in Foreman (or some other tool), then importing is not relevant at all. If we can set that on the host(group) level (besides the global setting), then that might already be sufficient for most users.

lzap · March 23, 2020, 3:01pm

Good call. Hmmm. I guess it’s all or nothing then.

lzap · March 23, 2020, 3:14pm

I have finished with step 1: getting some real-world JSON output from all major facter versions:

Step 2: create some integration tests from this and find out what is incorrectly parsed.

lzap · March 23, 2020, 3:26pm

Now, here is the problem. A simple case - bridge.

Neither from facter 2.x nor facter 4.x you are able to tell if the bridge (br0) slave device is enp1s0 or enp7s0. Facter does not advertise relationships between NICs, we briefly touched extending it in another thread. What’s worse is that Facter 4.x does not even include slave devices, so there is less information available.

The same applies to all master-slave relationships like VLAN, bonds or teams and combinations.

All we do these days is “heuristic”, some device names like eth0.13 implies this “could” be probably a VLAN with ID 13. But that’s wild guessing.

lzap · March 23, 2020, 3:29pm

My proposal then: let’s drop parsing alltogether and write a custom fact that adds those relationships, only if this fact is found then we have enough info to reconstruct the NICs. I suggest to start small and expand on top of this.

bagasse · March 23, 2020, 3:37pm

No, I don’t think. Facter (v3.11) reports bond like any other virtual or physical interface like following example (another oVirt HV, with vlan on bond and where XXX is an oVirt VM network, and em1,em2 are physical ifaces and bond slaves):

  "XXX": {
    "mac": "11:6a:1a:db:d1:45",
    "mtu": 1496
  },
  "bond0": {
    "mac": "11:6a:1a:db:d1:45",
    "mtu": 1500
  },
  "bond0.42": {
    "mac": "11:6a:1a:db:d1:45",
    "mtu": 1500
  },
  "em1": {
    "mac": "11:6a:1a:db:d1:45",
    "mtu": 1500
  },
  "em2": {
    "mac": "11:6a:1a:db:d1:46",
    "mtu": 1500
  },

Maybe rather than split network interface facts reporting by physical/virtual; because it seems it is complicated to know which one is physical/virtual; maybe reporting only interface with assigned IPs (all ifaces with IPs ? only ifaces with IPs corresponding to a network already present in foreman ?) will be more reliable compromise ?

I know it’s a little off topic but, if we keep the ‘netwok ifaces facts importer’ functionality (with added tests around it like @lzap said) , maybe it will be nice to be able to whitelist network ifaces to import/update instead of blacklisting (like it’s now possible in settings). It is sometime easier to manage (eg, in my case a whitelist that will be something like ‘eth[0-9].* bond[0-9].* p[0-9]p[0-9].* em[0-9].* ovitmgmt […]’). Or instead of an array of blacklist/whitelist, just a bare regex string that will permit the user to do what he want ?

ekohl · March 23, 2020, 3:42pm

Why a custom fact instead of submitting it upstream to facter?

TimoGoebel · March 23, 2020, 3:59pm

I think that’s a fair point, if we do not get the data we can’t process it. How do the other config management tools handle this?

lzap · March 24, 2020, 8:17am

I’d rather start with a custom fact that can be of course contributed upstream once it turns out reliable. We will need it anyway for older versions.

I believe the best in business is Ubuntu MAAS which is based around IPMI/Moonshot/KVM. It sends out architecture and MAC address and pairs that with IPMI. It’s a different approach than ours since we boot OS installers while MAAS is solely image-based provisioning approach:

github.com

canonical/maas/blob/51b9712c201c15d0425ed86f015133d4a6bb0cf9/src/metadataserver/user_data/templates/snippets/maas_enlist.sh

#!/bin/bash -e
#
#    maas-enlist: MAAS Enlistment Tool
#
#    Copyright (C) 2014-2018 Canonical Ltd.
#
#    Authors: Andres Rodriguez <andres.rodriguez@canonical.com>
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, version 3 of the License.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.

This file has been truncated. show original

I am not aware of other config management tools which parse NICs, show me.

TimoGoebel · March 24, 2020, 8:48am

Thanks, I meant Salt, Ansible, …

lzap · March 24, 2020, 8:56am

Why would those tools parse network interfaces just like we do?