Foreman Taxable_Taxonomies shrink/maintain

Lang_Jason · June 21, 2021, 3:38pm

Problem:
Foreman 1.24.3 - Attempting to migrate from “mysql” to PostgreSQL using the documented “prod2dev” method in the manual.
Truncating tables:
fact_values
fact_names
logs
messages
sources
reports
audits
To speed things along.

Now i’m at “taxable_taxonomies” which in an environment with 1200 servers has 3.4 million rows…
This is taking approximately 2-2.5 hours to migrate “by itself” - we are hoping to move this along faster if possible.
Can this table be truncated and it will safely regenerate?
Can it be cleaned up or maintained similar to reports?

I’ll admit - I’m not 100% sure what this table is or what it is used for.

Looking at it - at minimum - 1.76 million of my entries are of taxable_type "Audited::Audit
Could “these” be deleted manually prior to cutting over (since I’m truncating the audits table anyways as i migrate to Postgres?) Are there any others that might fall into this realm as well, to try and speed this up?

Thanks in advance!

Lang_Jason · June 21, 2021, 11:44pm

I “think” I solved this myself - though i invite others to tell me I’m wrong.

Looking at the audits:expire rake task - it looks to implicitly remove the taxable_taxonomies entries when the audits are expired: foreman/audits.rake at 3347fa49d500964f0209122d8d36c920d1feafcc · theforeman/foreman · GitHub

In my case - we only keep 7 days of audits to begin with due to size, and are truncating the table altogether to speed up the migration (7 days is nearly 1 million audit logs to process). All of the records of the type Audited::Audit in taxable_taxonomies are 2-3+ years old. I believe I either:

Cleaned up the audit table several times without using the rake task (likely before it existed, or for other reasons)
Had the rake task crash causing some record(s) to stick around in taxable_taxonomies

In any case - it looks like cleaning them all up via a delete loop 150k at a time worked fine. Entire migration process now takes < 3 minutes and the data “looks” good.

Hoping to complete migration to PostgreSQL in the next week or two for this environment, then move onto my largest environment (19k servers) Woot!

tbrisker · June 22, 2021, 7:17am

Great to hear that you figured it out!
That table is used for the mapping of all resources to taxonomies (locations/organizations), and indeed audits are also associated with taxonomies. If in the past you cleaned up the audits manually it is possible that this table was missed.
I do wonder though how you get to a million audits per week with just 1200 servers? What are some of the common audited changes? that might indicate some other issue you could address to improve performance of Foreman (or a bug that we can fix).

Lang_Jason · June 22, 2021, 1:55pm

Hey @tbrisker - def might be a bug - def might be “we did it stupid” - haven’t been able to figure it out yet beyond an (obviously hacky) workaround

All of our environments are “clusters” and “overbuilt to sustain the environment with 66% of the nodes offline”. This smaller 1200 node cluster has:

4 dedicated “Foreman” instances that “we” use to do webui and api stuff
6 dedicated “puppetmaster with foreman” instances that the servers use
At the end of the day we have puppetcode that creates a symlink on all of our foreman infra as follows:
[jlang1@fmnapnph1 ~]$ sudo ls -al /etc/puppetlabs/puppet/ssl/certs | grep host
lrwxrwxrwx. 1 root root 61 Mar 31 16:08 host.pem ->
/etc/puppetlabs/puppet/ssl/certs/fmnapnph1.my.fqdn.com.pem

We have the puppet CA configured to automatically add all needed Subject ALternative Names as SAN’s on the “FQDN” cert it deploys above, and all of our /etc/foreman yaml config files are configures similar to as follows:
[jlang1@fmnapnph1 ~]$ sudo cat /etc/foreman/settings.yaml | grep host
:websockets_ssl_key: /etc/puppetlabs/puppet/ssl/private_keys/host.pem
:websockets_ssl_cert: /etc/puppetlabs/puppet/ssl/certs/host.pem
:ssl_certificate: /etc/puppetlabs/puppet/ssl/certs/host.pem
:ssl_priv_key: /etc/puppetlabs/puppet/ssl/private_keys/host.pem

This configuration ends up with “tons” of audit logs of the nodes trying to step on each others config(s). Entries look like the below:
Updated Setting: unattended_url
Previous: https://fmnpmdvl1.myfqdn.com
New: https://fmnapdvl3.myfqdn.com

Updated Setting: foreman_url
Previous: https://fmnpmdvl1.myfqdn.com
New: https://fmnapdvl3.myfqdn.com

Updated Setting: unattended_url
Previous: https://fmnpmdvl3.myfqdn.com
New: https://fmnapdvl2.myfqdn.com

Updated Setting: foreman_url
Previous: https://fmnpmdvl3.myfqdn.com
New: https://fmnapdvl2.myfqdn.com

Updated Setting: unattended_url
Previous: https://fmnpmdvl2.myfqdn.com
New: https://fmnapdvl1.myfqdn.com

Updated Setting: foreman_url
Previous: https://fmnpmdvl2.myfqdn.com
New: https://fmnapdvl1.myfqdn.com

… And repeat non-stop many many times per minute.

I have an “audit” cleanup SQL job to remove these that run’s pretty often (every 5 minutes otherwise searching audits is impossible without a custom bookmark to filter/remove them). Otherwise the audits view in the UI simply times out on loading. Honestly - its super slow regardless, but totally unusable with the added “bloat”
This cleanup is where I went wrong - I wasn’t also cleaning up the associated items in taxable_taxonomies. Im guessing i can do some ID referential lookup magic to do it, or maybe there’s a way to redo the config so in a clustered environment - these stupid audit spam messages go away?

tbrisker · June 22, 2021, 2:08pm

This at least should be greatly improved in 2.5 with https://projects.theforeman.org/issues/30053

Specifically settings audits shouldn’t be taxable, but perhaps that has changed since 1.24.

I wonder why they keep getting updated. The default value is set to the host’s fqdn (which would cause the different instances to modify it), but it should only be done once when the service starts and not consistently. Additionally, if you set this to a non-default value (perhaps the LB fqdn?) they shouldn’t try to change it.

Lang_Jason · June 22, 2021, 3:09pm

Regarding tour last statement “non default value”

Thats the kicker. both of those settings “are” set in foreman to my LB FQDN’s. Additionally, the foreman settings themselves never “actually” change. they just constantly log that they “are” changing…

tbrisker · June 22, 2021, 5:20pm

Hmm, that’s quite odd… I’m wondering if that’s some bug that has been fixed since 1.24 or if that would still be an issue even in 2.5.
I did find Bug #28203: Audits permantly changing with foreman in cluster deployment - Foreman which seems quite similar to your issue though, if puppet also manages your foremans that could be triggering similar case?
Another option to try out is to change

github.com

theforeman/foreman/blob/develop/app/models/setting.rb#L4


require 'resolv'
class Setting < ApplicationRecord
  audited :except => [:name, :description, :category, :settings_type, :full_name, :encrypted], :on => [:update]
  extend FriendlyId
  friendly_id :name
  include ActiveModel::Validations
  include EncryptValue
  include PermissionName
  self.inheritance_column = 'category'
  graphql_type '::Types::Setting'
  TYPES = %w{integer boolean hash array string}

to

audited :except => [:default, :name, :description, :category, :settings_type, :full_name, :encrypted], :on => [:update]

Lang_Jason · June 23, 2021, 1:04pm

Interesting… We “do” use the puppet modules to configure and maintain our foreman instance. We also “do” set the fqdn in that puppetcode on each host. I’ve never thought to correlate the messages to puppet runs (which we do at a souped up every 5 minutes on our foreman infra compared to the rest of the environment) - but that might fit for sure…

I’m going to finish our postgresql upgrade and get it out of the way, then revisit using the “vanity url’s” in the puppetcode for the fqdn argument instead. If that doesn’t solve it, I’ll try the custom setting.rb you linked above to see if that helps. I’ll report back here!

Thanks again for the insight, information, and expertise!