RFC: Turning off Update subnet/domain from facts for managed interfaces

Hello,

I have to deal with various orchestration problems pretty regularly and today I have found out that Foreman breaks orchestration via the Update subnet/domain from facts feature.

When a managed interface with DHCP/DNS orchestration is updated from facts, no orchestration is actually triggered. I do not think this is a bug, triggering orchestration via puppet uploads would be performance suicide. However when Subnet, Domain or IP address is updated, that moment the inventory database is out of sync with reality. All the subsequent actions on that host will ultimately fail with conflicts and hard to troubleshoot errors with problematic workarounds.

For a very long time I think this was a bad design - provisioning and inventory information should be kept separate. Networking information is not the only one causing conflicts, the same story applies for Operating System (e.g. CentOS 8 vs CentOS Stream recent problem). However I do not thing it is the right time to reengineer this from scratch.

I would like to explore options we have to solve this because dealing with orchestration errors is very time consuming for both users and us. One solution which comes to my mind is radical but it makes sense: when a fact would update Subnet, Domain or IP on a managed interface, Foreman would simply refuse to perform this even when these settings would be turned on. We would advise users to uncheck managed flag if they desire to override what the host was provisioned with.

From the user perspective, we could present this via a new Host Status field (no idea for a name to):

  • OK - host subnet/domain/IP is in sync with facts
  • IP out of sync - change it manually or change NIC to unmanaged
  • Subnet out of sync - change it manually or change NIC to unmanaged
  • Domain out of sync - change it manually or change NIC to unmanaged
  • Unknown - no relevant facts were reported

These statuses could nicely explain what just happened and what users need to do in order to fix the issue. We would keep the current Administer - Settings for users who want to ignore information from facts for all hosts as well, but the default behavior for managed NICs would be to ignore the changes and only update the overall Host status.

This feels like a good compromise. What was very often seen as “mystery inventory changes” is now well defined and visible through UI and API/CLI. We would not affect umnanaged hosts - those users who like to use Foreman as a plain inventory would see no difference. Only users with managed hosts would benefit better usability and no orchestration errors.

Bump, anyone? Do I take this as “yes, remove these”? :slight_smile:

@ekohl @tbrisker @Marek_Hulan

If we want to fix it properly, we need to split the reported and user defined values. That way we could then allow user to “apply reported” after the review. The status is perhaps a good indicator but I’m not sure I’d like to see warning or error in case these two values differ. It may not be warning at all. At the same time if that’s one of another OK status of the host I would never notice. Host status shouldn’t be used to mitigate the design problem we have, user rely on it in their monitoring of Foreman health.

I think the right direction would be to clone IP, MAC, subnet_id, domain_id, operating_system_id and similar to ReportedData facet and start building on that.

Or we go the extra mile and move all the data to two facets:

  • reported data facet as described by Marek
  • desired state facet / provisioning facet to store user provided desired state data
    The host model could then just redirect to either of the facets.
    Might be the best way to keep the API stable.
    Thoughts on that?
1 Like

Why would we need to duplicate any kind of data? We have the facts available in the database, the change is essentially to when we do the database update - instead of doing it immediately, we would refactor this to be done after user review.

All we need is a flag(s) to indicate that there is a change (review) pending.