Foreman has always been an open platform for multiple configuration systems. Most of these report something what we generally refer as “facts”. Today, Foreman stores all facts into a single heap and uses “type” column to differentiate fact source (STI). This approach works well if you only use one fact source, e.g. ansible or puppet, but the moment you install Katello, RHSM gets into the bunch and things are funky. You end up seeing the same data presented in via two different path with slightly different results.
We currently store facts in a SQL way which can quickly get out of control when there are servers in the infrastructure which has many NICs, drives or when you use linux bridging for VMs or containers. In these cases, fact tables can grow and consume precious SQL resources and we have learned that many users don’t even use facts.
We have introduced fact name blacklisting which prevents from storing some offenders, but as we were adding more and more facts into the list, I have actually realized that we should be whitelisting instead of blacklisting. Thus the proposal.
Meet CFM: Common Fact Model
I would like to find the common denominator for all major configuration management systems Puppet and Ansible as well as Red Hat Subscription Manager facts. We would build a list of facts which are always enabled by default and I would create transformation API and implementations for Puppet, Ansible and RHSM to store those facts in a common way.
Every single fact should be researched and carefully evaluated because we need to make sure values have some important properties:
- Format. This is the most obvious one - a good example is OS or kernel version. Some sources report “5.7.0-0.rc2.1.fc33.x86_64”, others just “5.7.0” or they include release like “5.7.0-rc2”. If there is a sensible common unit (e.g. seconds, days, hours, bytes) the code must convert it. For unique things like kernel or OS version there must be some heuristics to extract the most important bits.
- Stability. Volatile facts like Puppet’s uptime presents an unwanted load on Foreman for no reason as they change every single fact upload. The form “42 hours” or even “56473612854 seconds since epoch” are wrong, we need to actually report boot time in UTC since epoch and convert this to human readable time when presenting the fact.
- Importance. If we have idea what’s the most important for you guys, Foreman could store volatile facts separately for example only in Redis, or just as a simple text/JSON object in SQL database which is much cheaper to store than facts in normal form.
We would like to continue storing CFM (important facts) in the current SQL normal form which is very user friendly when it comes to searching. Users would be still able to use searching capabilities in UI, CLI, API and inventory. However volatile facts (the rest actually) we would just store in a way it could be only retrieved for individual hosts (all of them). Also there would be no transformations applied to them, so one could fetch the original (unchanged) facts from all systems.
Please help me to undesrtand which facts are the most important for your workflows and fill in this simple form. I ask to give a list of facts which you think are important in a way that you want these to be presented in the UI/CLI or you absolutely cannot live without them when using Foreman as cfgmgmg inventory source.
If you use multiple technologies (Puppet, Ansible, RHSM), fill it in multiple times.
Thanks