RFC: Turning off auto updates from facts for NICs and OS

Hello,

I have to deal with various orchestration problems pretty regularly and today I have found out that Foreman breaks orchestration via the Update subnet/domain from facts feature.

When a managed interface with DHCP/DNS orchestration is updated from facts, no orchestration is actually triggered. I do not think this is a bug, triggering orchestration via puppet uploads would be performance suicide. However when Subnet, Domain or IP address is updated, that moment the inventory database is out of sync with reality. All the subsequent actions on that host will ultimately fail with conflicts and hard to troubleshoot errors with problematic workarounds.

For a very long time I think this was a bad design - provisioning and inventory information should be kept separate. Networking information is not the only one causing conflicts, the same story applies for Operating System (e.g. CentOS 8 vs CentOS Stream recent problem). However I do not thing it is the right time to reengineer this from scratch.

I would like to explore options we have to solve this because dealing with orchestration errors is very time consuming for both users and us. One solution which comes to my mind is radical but it makes sense: when a fact would update Subnet, Domain or IP on a managed interface, Foreman would simply refuse to perform this even when these settings would be turned on. We would advise users to uncheck managed flag if they desire to override what the host was provisioned with.

From the user perspective, we could present this via a new Host Status field (no idea for a name to):

  • OK - host subnet/domain/IP is in sync with facts
  • IP out of sync - change it manually or change NIC to unmanaged
  • Subnet out of sync - change it manually or change NIC to unmanaged
  • Domain out of sync - change it manually or change NIC to unmanaged
  • Unknown - no relevant facts were reported

These statuses could nicely explain what just happened and what users need to do in order to fix the issue. We would keep the current Administer - Settings for users who want to ignore information from facts for all hosts as well, but the default behavior for managed NICs would be to ignore the changes and only update the overall Host status.

This feels like a good compromise. What was very often seen as “mystery inventory changes” is now well defined and visible through UI and API/CLI. We would not affect umnanaged hosts - those users who like to use Foreman as a plain inventory would see no difference. Only users with managed hosts would benefit better usability and no orchestration errors.

2 Likes

Bump, anyone? Do I take this as “yes, remove these”? :slight_smile:

@ekohl @tbrisker @Marek_Hulan

If we want to fix it properly, we need to split the reported and user defined values. That way we could then allow user to “apply reported” after the review. The status is perhaps a good indicator but I’m not sure I’d like to see warning or error in case these two values differ. It may not be warning at all. At the same time if that’s one of another OK status of the host I would never notice. Host status shouldn’t be used to mitigate the design problem we have, user rely on it in their monitoring of Foreman health.

I think the right direction would be to clone IP, MAC, subnet_id, domain_id, operating_system_id and similar to ReportedData facet and start building on that.

Or we go the extra mile and move all the data to two facets:

  • reported data facet as described by Marek
  • desired state facet / provisioning facet to store user provided desired state data
    The host model could then just redirect to either of the facets.
    Might be the best way to keep the API stable.
    Thoughts on that?
1 Like

Why would we need to duplicate any kind of data? We have the facts available in the database, the change is essentially to when we do the database update - instead of doing it immediately, we would refactor this to be done after user review.

All we need is a flag(s) to indicate that there is a change (review) pending.

I would like to bring this to our attention once more as I was digging and resolving another problem related to this. If this list of settings we currently have is not a sign of a bad design, I don’t know what else is:

  • Administer - Settings - Facts - Update hostgroup from facts
  • Administer - Settings - Facts - Update subnets from facts
  • Administer - Settings - Facts - Update environment from facts
  • Administer - Settings - Facts - Ignore facts for operating system
  • Administer - Settings - Facts - Ignore domain for operating system

There is no need to store reported data in a database at all, we just need a flag that will indicate that there is a change pending. When user opens up a host with such flag, Foreman can easily fetch its facts and perform the explicit comparison showing the diff and asking to confirm on the fly. The same would apply for mass action, but this time this would be controlled (background task) - something that if goes wild can be cancelled and investigated.

In other words, unless there are any objections I will go ahead and write a patch:

  • New flag indicating there is a fact drift (per each fact type: Puppet, Ansible, RHSM) for a host.
  • New action for single host to remediate the problem (from arbitrary fact type).
  • New mass-action for multiple hosts to remediate the problem (background job).
  • A rake task to do the same so people can run this regularly.
  • Removal of all the above settings.
2 Likes

Relevant read: RFC: Store parsed facts on clients

@lzap Any news on this front? :slight_smile:

Nope.