RFC: Optimized reports storage

Thanks for the writeup @lzap, a few comments/questions inline:

Changing all existing integration points for report importing will be difficult. In fact, we are still maintaining the /reports endpoint because that’s where older versions of ansible where configured to send their reports; I expect that is true for additional sources.

Do all sources provide a JSON output?
Also, do we want to enable searching inside the json? if so it may make sense to store a jsonb with proper index.

Why is this needed? Do we actually get a significant benefit from using a bit field for the statuses over using strings (or some sort of tags if we want to have multiple ones)?

Which sources should be handled in plugins and which should be handled in core? Right now we have puppet in core and plan to keep it that way. Perhaps this is something that should be all in core? I.E. Foreman knows out of the box how to handle reports from multiple sources, without requiring additional plugins (which usually adds more functionality and complexity besides report processing) to set up.

Some of these seem very similar to statuses, could you elaborate on how keywords are different and why we need both?

with 10k hosts reporting in every 30 minutes (the default for puppet), this means over 3 million reports per week. Even if we limit search to a week, we should never have a search use full table scan, but rather make sure all searchable fields are properly indexed.

Input format must not change, therefore this is only matter of an “alias”, single controller can handle multiple URLs.

Actually, I said JSON but I did not mean it. The body column is arbitrary text, plugins can use whatever fits their needs. As long as it’s text.

I did my research, JSON/JSONB types are actually uncompressed and slower than arbitrary text. JSON searching is slow overall and something I want specifically to avoid in my design.

Numbers have some benefits over strings, yeah. You can compare them numerically :wink: Granted this status number cannot be indexed so it’s always a table scan. Which is not a huge problem, keywords (described below) and host ID can be used to make the scan set small enough.

In my previous proposals, I was trying to find a solution using “unnamed” columns like t1 or t2 and similar where every plugin would store their own data. StatusCalculator is build around Puppet: applied restarted failed failed_restarts skipped pending. It currently uses 5 bytes of 8 bytes available for bigint, maximum value currently is 63.

If you think it is the time to get rid of this weird thing, I am more than happy to kill status calculator. One way would be creating INT columns for selected (or all?) statuses naming them in more generic way, perhaps something like: success, warning, error, skipped. Each plugin would map its own meaning.

Alternatively, these could be come keywords too: PuppetHasFaiulres2, PuppetHasFaiulres3-10 (sort of intervals, we don’t want to create a keyword for every single number).

The reason is why I propose to keep the status calculator is the code is there, it’s supported in scoped_search so we would not break existing queries. I want just to bump amount of bits from 6 to 10 so whole 64bits of bigint is utilized (1024 possible values per status).

I’d say just puppet until it’s extracted? There will be “the default” implementation that will store the input “as is” and present it as a plain text on the view page. Type: unknown.

Keywords are for searching only, they have an index so database can quickly pull reports with given keywords. As I said, statuses are numbers, allows you to search for exact amount of failures. Secondly, status column allows rendering of index page/table - each report has amount of statuses shown:

[root@sat68 ~]# hammer report list


---|---------------|---------------------|---------|-----------|--------|------------------|---------|--------
ID | HOST          | LAST REPORT         | APPLIED | RESTARTED | FAILED | RESTART FAILURES | SKIPPED | PENDING
---|---------------|---------------------|---------|-----------|--------|------------------|---------|--------
1  | sat68.nat.lan | 2020/09/29 09:44:01 | 0       | 0         | 0      | 0                | 0       | 0      
---|---------------|---------------------|---------|-----------|--------|------------------|---------|--------

Granted, this can be only stored in body, that will probably make index page little bit slower but since we only render finite amount of reports on a page or hammer list, it’s not big deal.

Every update to a table with 10 indices ultimately leads to 10 index updates which is a relatively expensive operation. With 10k hosts reporting every 30 minutes, database does a lot of work. Now, compare to how often you search for a report? Once a day? Tens times a day? My point is, this whole design is optimized for report storing, not searching. Thus my ultimate goal is to have as little indices as possible

Having that said, there is the concept of keywords which are indeed indexed and will give instant results. Users will be able to quickly find reports that has failures, errors, whatever, in case of OpenSCAP even failed rules. Depends on what plugin authors creates.

Then there’s scoped search for status which is a table scan over bitmap today, I propose to keep that behavior (or remove it completely). This does not change.

Finally, today reports allow search for exact line in messages or sources. You have to search for the whole line, e.g. “Applied catalog in 0.39 seconds” only after that it’s a hash + join. If you try to do SQL LIKE search it’s again full table scan today. And this proposal does not change that - my goal is definitely not to introduce some fulltext search for reports. This is the only “downgrade” in my proposal, which is not a problem - you never search for that whole line and if you really really need this, you can do a tablescan over body with reported_at limit.

I am putting together some notes from Tomer’s response and my discussion with Marek. Here are some open questions:

  • Do we want to kill StatusCalculator (the bitfield, status column)? It is clunky indeed, do users really need to make queries like “all reports that has 4 or more errors”? Isn’t just “reports with errors” enough?
  • The “version” column is probably not needed - version can be stored in the body itself if needed.
  • Reports should contain proxy_id reference from what proxy the report came in. This is probably the only reason why OpenSCAP plugin would need to extend the model, this can be useful for other plugins as well.
  • There could be a generic set of keywords without “Puppet” or “Ansible” prefixes: HasFailure or HasWarning that would search across types. There is no way to enforce plugins to create those keywords, so I think this could be actually confusing for users not finding all records in the future.
  • To what degree we can break search syntax? The purpose of this refactoring| is to vastly improve performance by storing reports in a different way, it will change for sure. Here is a list of hopefully all possible search conditions:
Old condition New condition Indices used Comment
host = xyz host = xyz yes
taxonomy taxonomy yes Both organization and location untouched
message = “Applied catalog in 0.39 seconds” - yes This will not work anymore (*)
source = "//blah/blah/blah.pp/ - yes This will not work anymore (*)
environment = xyz ? yes Although I don’t see this reference in schema anymore, is this something that should work?
hostgroup = xyz ? yes Although I don’t see this reference in schema anymore, is this something that should work?
applied > 1 keyword = PuppetHasApplied no Only if we choose to keep StatusCalculator or replace it with regular columns. The same for restarted, failed, skipped, pending etc.

(*) - we can either make it an error, which I prefer, or modify code in a way that his performs tablescan instead

For the record, here are statuses (metrics) which are available for the three major report types:

Type Metrics
Puppet applied restarted failed failed_restarts skipped pending
Ansible ok change unreachable failed skipped rescued ignored
OpenSCAP passed othered* failed

*othered - one of: pass fail error unknown notapplicable notchecked notselected informational fixed

Finding a common terms is a challenge without confusing the user, keeping the bitmask might be a good idea.

As much as I would like to do a small patch(es) and remain plugins compatibility, this is not possible and I need to create a brand new model that can co-exist with the current Report and ConfigReport. STI in the new design does not make sense, a report will be simply a body (text) with a type and classes that handles import/view actions.

I will start with ReportTranscript model name for now, if you have a better idea let me know. I want to be able to work on the PR(s) alongside of the current model. There will be modernized API, completely new UI and CLI around and the mentioned /report and /config_report endpoints will eventually be refactored to drop old models and make use of the new ones.

I want to keep StatusCalculator which has some drawbacks but on the other hand it allows plugins to work with their own sets of statuses. For searching there will be keywords anyway. Version column is not needed, I will store proxy_id in the new model.

@tbrisker can you show me where do we have this? I can’t find anything relevant in routes.rb.

For the record, I am creating a tracking redmine ticket for this task. I am gonna start working on this pretty soon:

https://projects.theforeman.org/issues/31142

Do you think it is a good time to deprecate this endpoint for the next release and remove it in the release after when I deliver the refactoring?

Perhaps we can even include a deprecation in 2.2, even if we only remove it in 2.4.

Unfortunately, that end point has been deprecated for several years already. However older versions of ansible (until 2.10 if i’m not mistaken) ship with a callback pointing to it. The same was also true for the chef handler, and potentially people who’ve installed our puppet enc script many years ago. That is the reason I didn’t remove it with the rest of the /reports api endpoints in

I will keep the endpoint then and fix Ansible.

Ansible is fixed already since https://github.com/ansible/ansible/pull/64955 but that only affects ansible 2.10 and newer. Unfortunately many distros ship older versions by default - for example, epel7 has 2.9 and debian 10 has 2.7, so we won’t be able to drop this endpoint until they go eol, which is several years away.

1 Like

Note that ansible callback using this API now lives in theforeman.foreman ansible collection and we’ll start deploying it by default soon (if we don’t already). We don’t have to worry about that much.

For the record, the code is incubating in the following repositories:

Warning, both plugins are in very early stage of development, you can’t even start them there are just some tests and initial code. However the README describes the current format with some examples.

2 Likes

For the record, to further optimize we will store association between report and keyword indirectly via integer array which is extremely fast compared to plain join table. This will speedup reports processing even further:

https://lukas.zapletalovi.com/2021/08/denormalizing-postgres-join-tables.html

Is it necessary to adapt foreman_salt / foreman_chef or other plugins because of the new foreman_host_reports plugin?

1 Like

Hello,

yes, take a look on both plugins. What you need is to implement a smart proxy “processor” which is simply JSON-in JSON-out type of class that performs transformations (if any are needed) or filtering of unnecessary content.

Then reports are stored as JSON encoded string in a new table, there you need to implement a custom show screen for the report. You can now customize the screen as you like, it does not need to be puppet-centric. We are currently working on the pages, we will be refactoring them quite a lot so if you want to start I suggest to start with the smart proxy first to give us more time to work on the UI parts. Ideal timing would be:

  • smart proxy plugin changes - now it’s essentially ready (I am working on packaging)
  • foreman plugin changes - in a month we should have more stable UI

You can easily test it today even without Puppet or Ansible - I am actually working on fixtures that will enable you to upload “test reports” via curl script: https://github.com/theforeman/smart_proxy_host_reports/pull/5 it woudl be great if you could prepare same examples for Salt/Chef so we can also test these kinds of reports or discuss the UI with our UX specialist which is planned soon.

The plan is to ship all code (we call them formats) in both plugins, we wanted to avoid doing “plugins of plugins”. The problem is extremely easy to understand and codebase is actually pretty small. Let me know if you need any help.

Edit: We also plan to create facts endpoint in the SP plugin, it is weird to configure Puppet and Ansible to send reports to smart proxy and facts to Foreman directly. The endpoint will simply pass facts as-is to Foreman, in the future we can add some fact filtering (unwanted facts). This is planned but the format of facts is the same so no code needed from your side just be ready for this (e.g. configuration).

2 Likes

I’m now confused. In Migrating Reports to the new format in Foreman 3.2+ - #42 by Ron_Lavi it is mentioned, that the Host Reports plugin development was stopped.
What exactly does this mean? Should the plugins still move forward to implement the smart proxy processor; etc?

Hey @Bernhard_Suttner,
sorry for not updating here as well,
As @lzap mentioned at the beginning of the post, the goal was to speed up the import of the reports in Foreman.

However, due to capacity issues, we stopped the new Host Reports plugin development for now,

so in short, you shouldn’t update plugins to use it at the moment.

2 Likes