Foreman Large DB Tables Maintenance

Lang_Jason · February 6, 2018, 2:56pm

So, for larger foreman installations, it seems certain table(s) are not covered by the normal maintenance rake tasks. I can clear out reports no problem, as well as sessions, using built-in rake tasks.

However, these tables seem to just grow, and grow, and grow. On our install (11,000 agents checking in every 30 minutes) they are each millions or tens of millions of records, and id rather have some of it “age out” but they seem to go back infinitely:

messages
sources
logs
audits

“How” Can i safely trim these? Can I?

Looking at messages specifically, it looks like report run details…
Some have timestamps similar to: 2017-11-30 22:12:15.414902385 -0500

I find this odd, because I use the foreman rake task to delete reports older than 30 days. 11/30 was 3 months ago? And the report it was a part of is long gone…Would anything built in (rake task) clean these up? Am I OK to truncate this table occasionally? Any downsides to truncating it?

Looking at sources specifically, I’m not positive what i’m looking at, but i see a ton of values which are puppet file resources, and digests which are hashes. Would anything built in (rake task) clean these up? Am I OK to truncate this table occasionally? Any downsides to truncating it?

Looking at logs specifically, This looks like the association (foreign key/referential integrity) table between reports, sources, audits, in the format:

id, source_id, message_id, report_id, level_id, created_at, updated_at, result

‘10701’, ‘1094’, ‘1121’, ‘458181’, ‘3’, ‘2015-09-17 17:42:24’, ‘2015-09-17 17:42:24’, NULL

Specifically looking at the above record from my table, its from September 2017. But again, we clear all reports after 30 days - the report ID referenced no longer exists in foreman (i searched for it).

Looking at “audits” it seems to be all audit records visible within foreman. However, this too, goes back to the beginning of our install 3+ years ago, and is millions and millions of records. I’d like to only keep audits for, say, 3 months, or 6 months, or 1 year, but this doesn’t look configurable. Is there any downside to truncating the audits table? I know from doing upgrades that many tasks seem to do selects against it - particularly for the db:seed tasks

Thanks for anyone who can help me understand how to properly maintain these tables, I’m hoping to keep my DB size under control, as with these tables it just seems to grow indefinitely.

~Jason

lzap · February 7, 2018, 10:51am

We have few rake tasks which are executed from cron which expire messages, sources and logs. Audits are not being wiped out at the moment, we are currently working on that, it’s on our radar.

Those rake tasks can do the job for you, but if you have loaded instance they can be slow as they attempt to delete all the records in one transactions which can cause requests to be paused or even dropped. What you can do:

make a backup of your postgres database
recover it on safe server
try to simply delete from “reports” table, we have cascade delete and it should drop logs, sources, messages as well
deleting audits should be easy that will work
use created_at or updated_at to prevent deletion of new entries (both reports and audits have them)

We need to consolidate logs table at some point, breaking the report into individual lines and source shas is good for searching, but how many times you search for individual log or sha across reports? The price we need to pay is IMHO too high. We mostly use this to filter our errors/warnings when showing a single report, this can be done on the client. We should be able to count number of lines, errors, warnings during report upload and store it as numbers, the whole report could be just text. This will dramatically improve both performance and database/indices size.