RFC: Storing reported data / Compute Resource Data as source for facts

Hi guys,

I’d like to some up a quick brainstorming Marek and I had in a PR that extends the reported data facet.

The PR extends Foreman’s fact parser infrastructure so we can store normalized information in a separate host facet. I believe this is a great foundation for more features. We can make the data searchable, display it in the UI, use it in templates, give users an overview of how many subscriptions they need etc. etc.
And we can differentiate between storing data we gathered from facts and data how a host was initially created.

I tossed in the idea to also use the data from compute resources (or ipmi?) to fill that information. When Foreman creates a VM, it does not the amount of available memory for example.
Especially if we show the information in cards (patternfly cards that is, which I can totally imagine) it would be great to have the data available as easly as possible (before a fact import).
We could also extend this to supersede virt_who, which would be a great usability improvement to be honest. Users would just need to configure a compute resource and that connection would be used to gather relevant data about the VMs.

So, my questions are:

  • What do you think about this?
  • What information would be useful?
  • Do you see more use cases where this could become handy?



Thansk @TimoGoebel for taking your time to write it here. I believe we should limit ourselves to only information that is relevant to our domain and avoid making generic system information gatherer or unifier. So for me, good candidates are information that people ask for in reports or filter by very often. Also with this concept, one may need to expect latency or no cleanup of the information. E.g. amount of RAM or host uptime can be out of date, if we didn’t get fresh data for a while.

One thing that was raised by @ekohl during that discussion was, whether we should also keep information about source of that information. In case we have several information sources with different accuracy, we could see flip-flopping.

I really like the idea of replacing virt-who. I guess if we can store information about what hypervisor the VM is running on, we’re good. Virt-who has some advanced features, such as partial updates (in case of vmware), but is also quite complicated to configure properly.

We should also consider dropping the updating from facts or at least make it a proper pipeline:

  • Receive raw facts
  • Store raw facts as JSON
  • Run the correct FactParser
  • Store parsed facts in reported data facet
  • Optionally run HostUpdater to update host properties

This would have some duplication, but every storage step serves a different purpose. I like the idea of separating the desired state and the reported state. This allows reporting mismatches. The raw facts are still useful for use in templates and arbitrary searching.

The question with mismatches is: Does a user really care about this? What is the desired way to resolve a mismatch?

That depends. At a previous employer we had customers who paid for (virtual) servers which had certain specifications but every now and they’d need some additional performance. Getting a report of those servers rather than updating the database would be useful.

Luckily we now have the Report Templates which are a perfect match for this use case.

When it comes to syncing, I don’t think we can say which one’s correct (from a Foreman perspective) so IMHO we the best we can do is an easy way to correct in either direction: changing the desired spec based on reporting and scaling the VM down to the desired spec. The former is easy, the latter might need some thought. Perhaps it needs scheduling.

What does this feature do exactly? How would this feature still be relevant when the virt-who functionality would be part of Foreman?

If I recall correctly, virt-who can listen on changes on vmware and consume less amount data (diff only) which is then passed to candlepin. Perhaps @Justin_Sherrill could explain better. I assume this new feature would regularly scan vms on compute resource, hence would deal with all VMs. We could potentially implement it similarly, but I was under assumption, we’d initiate the request regularly from Foreman side, instead of being attached to vmware and listen on changes.

One quick note: Getting a feed from VMware witch changes is pretty easy to implement. The harder part is to sync that data with a local database. I played with it a little last year.