Can you describe the final design you want to achieve? For me these are the flaws we would like to solve:
- Unreadable code (several objects calling each other, unclear responsibilities)
- Cleaner flow of data, ideally a Ruby hash should be passed along for the whole processing until it reaches the final class that stores it in the DB
- Ability to normalize facts for host facets (OS names, architecture names)
- Flexible and clean way for plugins to override or add new parsers without method overloading or monkey patching
- Ability to call some parsers later (from a rake task/background job) - useful for NIC processing which is quite slow but do not implement this in the first phase just make the design allows this.
- Decreasing amount of facts stored in FactName and FactValue tables (aka Common Fact Model)
- Storing all facts as a raw JSON in Postgres for ENC or templates.
Here are some support use cases:
- As a dev, I want a clean codebase with an easy-to-understand code flow.
- As a puppet/ansible/rhsm user, I want to avoid each technology to overwrite OS or host facets with different value (normalized values).
- As a puppet/ansible/rhsm user, I want fact import to be fast.
- As a user with hypervisors with many NICs, I still want to be able to filter out network names.
- As a user with hypervisors with many NICs, I maybe want to defer NIC processing to later.
I suggest you continue in what you are doing so far - refactoring the code without any semantic changes into the final design that is clear to understand. Only then start working on improving things like normalizing or storing facts in JSON in the DB.
From reading your post, I am not entirely sure how things would look like. Instead of the spaghetti code we currently have, I would expect a chain of calls with a clean interface. I think we agree that the pipeline should be simple so I suggest something like:
FactParser.process(raw_facts, foreman_facts)
where raw_facts is the input (immutable) Ruby hash and foreman_facts is a hash that is empty in the beginning and parsers can store values into it. It can be used later on in the chain to be stored.
Storing of facts should probably happen in a different class, something like:
FactSerializer.process(raw_facts, foreman_facts, host)
We would need several implementations of serializers:
- NormalFactSerializer - stores all raw_facts in FactName/Value tables as we do today (we have a filtering and some limiting mechanisms - we should keep this for now)
- ForemanFactSerlializer - stores foreman_facts as extra facts under “foreman” namespace, so things like “foreman::os_name” or “foreman::os_version” are available. This gives plugin authors the flexibility to add their own normalized facts across technologies (RHSM, Ansible, Puppet, Chef, Salt) without implementing host facet.
- NICFactSerlializer - performs detection of NICs and stores them for a host.
- HostAttributesFactSerializer - performs storing of host facets (are these called attributes? not sure)
- RawFactSerializer - later on this would store all raw_facts unchanged in a SQL TEXT table. We would return that in
@host.facts_hash
call so even after we decide to cherry pick facts to bring number of facts in FactName/Value tables down, users in templates or ENC would see them all. We need to store all fact types separately so this would be a joined table per host with a type column.
Note that normal FactParser
classes (or whatever you name them) do not have access to anything else than input hash and output hash. They should not be doing any ActiveRecord, this is intended. Also, discovery needs to be able to override NICFactSerializer via as you mention a hook or something to change suggested primary interface.
As for the background processing, we discussed we could process some facts (NICs) just before user attempts to open a page with a host with some “dirty” flag, however, Marek had a good point that you want to be able to search for hosts by IP or MAC, which can be out of date. So we would either need some background processing or simply telling our users to use facts search.
I still believe that updating host data from facts is a bad thing and we should not be doing that at all. This is irrelevant for ENC, facts are used for ENC and not Foreman host information. So if this was on me, I would simply implement a helper text on a host result page saying “Did not find what you want? Try searching via facts as Foreman data can be out of date.” Pulling info from facts should be an action explicitly triggered by user (sure it can be a background process because it will take long for all hosts perhaps).
We would still need to do immediate fact parsing for new hosts which become unmanaged hosts. We need to parse everything immediately for these hosts, background processing or and kind of deferred parsing can be only applied to existing hosts (as more likely there are no changes for them).