RFC: Common Fact Model

lzap · May 26, 2020, 12:40pm

Yes, I know these exists as you pointed it out already. I believe we should still store common facts and adopt whitelisting. This is how I see the three levels of facts:

Core facts. The ones you have described, stored with proper type and ideal for searching or even sorting. We would probably implement only a small portion (probably up to dozen) of them over time. These are stored in factet in their respective database fields, meaning fast searching and even sorting is possible.
Common facts. Those I have described - whitelisted and carefully selected facts which would include the core facts but add more like operating system, kernel version and similar. My wild guess is we would have two or three dozens of them over time. We should still transform them into reasonable units and format (e.g. kernel version to X.Y.Z). These would be stored in our fact name and value tables, therefore little bit more expensive to update but searchable.
Arbitrary facts. All the rest would be stored separately, in a JSON or text blob which is fast to update but unable to search via indices (only tablescan). We would keep a copy from every client, e.g. Puppet, Ansible or RHSM.

See, my proposal does not try to reinvent the wheel here. I am trying to improve what we already have and trying to have a nice out-of-box experience even on deployments where there are thousands of clients all reporting via multiple channels.