Hey,
we are working on redesigning configuration reports as well as OpenSCAP reports provided by OpenSCAP plugin for better performance. In both cases the biggest problem were join tables between report and message/source tables containing report text. I came to conclusion that we need to eliminate those, however, they are still useful for searching in some cases:
- Puppet - search for a report with a particular (failed) resource
- OpenSCAP - search for a (failed) rule
In my original proposal, I wanted to optimize for fast import and since there are not many searches compared to amount of database inserts/deletions, I was thinking that tablescan will be good enough. For configuration reports, smaller join table to keywords
is planned but we plan to keep explicit things to put into this table (e.g. for puppet only failed resource names). For OpenSCAP, the plan is to rename sources
to rules
and put all rules there.
During the day of learning last week, I researched that Ruby on Rails do support natively PostgreSQL array column feature with GIN index.
This looked like a promising solution for both new configuration reports as well as OpenSCAP plugin refactoring. Both would still benefit from association (report-keyword, report-rule) but if this is done indirectly using integer array and searching is fast enough, this would make report importing even faster and more efficient.
I did some performance testing and results are more than promising, almost hard to believe. In short, storing keyword IDs or rule IDs in an integer array is
- significantly more efficient in terms of storage size (3.7 GB vs 0.74 GB)
- much faster to query (60ms vs 9ms)
- was faster for updates (26 seconds vs 2 seconds to import 100 reports)
If there are no concerns, I would like to implement this type and index in both foreman_host_reports and foreman_openscap plugins. The only blocker could be too old PostgreSQL version for Debian deployments, but this feature has been in PostgreSQL for many years (8.3 according to documentation).