Proposal: Utilize PostgreSQL arrays and GIN indexes

Hey,

we are working on redesigning configuration reports as well as OpenSCAP reports provided by OpenSCAP plugin for better performance. In both cases the biggest problem were join tables between report and message/source tables containing report text. I came to conclusion that we need to eliminate those, however, they are still useful for searching in some cases:

  • Puppet - search for a report with a particular (failed) resource
  • OpenSCAP - search for a (failed) rule

In my original proposal, I wanted to optimize for fast import and since there are not many searches compared to amount of database inserts/deletions, I was thinking that tablescan will be good enough. For configuration reports, smaller join table to keywords is planned but we plan to keep explicit things to put into this table (e.g. for puppet only failed resource names). For OpenSCAP, the plan is to rename sources to rules and put all rules there.

During the day of learning last week, I researched that Ruby on Rails do support natively PostgreSQL array column feature with GIN index.

This looked like a promising solution for both new configuration reports as well as OpenSCAP plugin refactoring. Both would still benefit from association (report-keyword, report-rule) but if this is done indirectly using integer array and searching is fast enough, this would make report importing even faster and more efficient.

I did some performance testing and results are more than promising, almost hard to believe. In short, storing keyword IDs or rule IDs in an integer array is

  • significantly more efficient in terms of storage size (3.7 GB vs 0.74 GB)
  • much faster to query (60ms vs 9ms)
  • was faster for updates (26 seconds vs 2 seconds to import 100 reports)

https://lukas.zapletalovi.com/2021/08/denormalizing-postgres-join-tables.html

If there are no concerns, I would like to implement this type and index in both foreman_host_reports and foreman_openscap plugins. The only blocker could be too old PostgreSQL version for Debian deployments, but this feature has been in PostgreSQL for many years (8.3 according to documentation).

1 Like

We have PostgreSQL 10 on Ubuntu 18.04. Then there’s version 11 on Debian 10. See Deprecation plans for EL7, Debian 10 and Ubuntu 18.04 for my thoughts on the plan to drop these, but it sounds like this shouldn’t affect you really.

2 Likes

Thanks for confirmation. Numbers really looks promising and I think we can adopt the same technique for facts too if it confirms to be good.