RFC: Dropping nested (structured) facts

Hello,

Foreman facts feature currently stores facts into an actual DB tree structure. The class responsible for this is called StructuredFactImporter and utilized by Puppet and RHSM (perhaps more). The problem with this approach is that the fact importing is very costly - Foreman needs to analyze fact names one by one, ensure all nodes and create/update them. When fetching facts, it also needs to compose all fact keys too.

The only advantage it seems is that users can drill down by nodes in the Fact index page across multiple hosts. I think this is not very useful, if not actually a bad user experience as it is quite slow to reach leaf nodes in the UI. I would like to propose to drop the tree structure and store facts in a plain way: key - value. So instead this:

dmi::
  bios::
    release_date: 07/08/2009
    vendor: Lenovo

we would simply store:

dmi::bios::release_date: 07/08/2009
dmi::bios::vendor:: Lenovo

From the user perspective, nothing would change. Searching only supports full leaf nodes, so search queries would remain the same. E.g. dmi::bios::vendor = "Lenovo". Non-structured sources (e.g. Ansible) would remain unchanged. The only missing feature would be a redesigned Facts index page where instead of drill-down table we would simply present a list.

Why am I proposing this should be obvious by now - by simplifying the design, we could actually dramatically improve fact import performance. Once we ensured the host is a valid host stored in the database (currently we allow unsaved host active records instances), we could insert and update facts more efficiently. Breaking up nodes would no longer be necessary and we could also explore some more advanced postgresql features like UPSERT to optimize fact imports even further.

I propose to keep the :: (four dots) separator, I know that a simple dot might look easier to type, however, we do not want to break searching and also dot is quite bad candidate because it already caused issues previoiusly (e.g. interface names with vlan tags or aliases: eth0.1).

This will also open doors to another level of optimization - I believe that breaking up facts into FactName and FactValue tables is not the best solution. A single table could be used to store all facts: ID, HOST_ID, FACT_NAME, FACT_VALUE. This SQL de-normalization could possibly be more efficient for updates (and less efficient for reads across many hosts), however, this is definitely not a material for the first version and I would need to research more about this. But we can keep en eye on this for the future.

One final idea is also quite interesting - offloading fact parsing to a smart proxy. Since import is the only way how data in Foreman DB can change, smart proxy could effectively perform fact filtering (we do drop some unwanted facts) and also it could also maintain a cache of “last seen” facts for a particular host. Then the calculation of what needs to be updated, created and deleted could be completely done on smart proxy, therefore it could automatically drop those facts which are unchanged and Foreman would have less work to process.

If you have any concerns with the removal of the three structure, speak up now. Cheers!

1 Like