This is an older thread - but i wanted to add my opinion as i was linked to it recently from here: Foreman-db-manage-rake usage?
I currently have:
18 puppet masters (smart proxy) with Foreman role installed
12 Foreman (no smart proxy) “UI” servers
6 “PXE” smart proxies (dhcp/tftp) with nothing else installed.
This is definitely “a bit overkill” but suffices to fill our need for full resiliency and full capacity across multiple (3 total) datacenters for 12,000 nodes with a lot (800 or so puppet classes) of puppet code being run, adding approximately 1-2k more nodes every month
My last few upgrades have been a nightmare due to the above RPM “issue”. From previous experience, i know that while Foreman “can” run in mixed mode with some roles upgraded and others not, with as many nodes (and api scripts/automation jobs/stuff running) as we have, i quickly hit our MYSQL database’s “max error count” for which the database starts blocking connections. Therefore, I tend to take everything offline during the upgrade until everything is completely upgraded. I can make the error count higher, but at some point that’s a losing battle versus just upgrading all at once, and really has no bearing on the overall “it takes forever for the same tasks to run hundreds of times” argument below
My first Foreman node takes about 2 hours, as with multiple plugins being upgraded, the RPM installers loop through the “migrate/seed/apipie cache” 5-6 times. I just watch the logs and watch the database keep running the “same.freaking.queries” in a 20-25 minute cycle.
Then i get to do this again, and again, and again. At some point, I just do the remainder in parallel, which causes the DB to get overloaded and “fail” on the seeds and migrates. I don’t generally care because its already all done…but still “scary” to be deliberately banking on errors to speed things up… However, the last few don’t error and i get to sit around and wait for them to finish, about 3-4 hours later.
My whole upgrade time now ends up being 13 hours of literally staring at nothing while i wait for the same migrate/seed/apipie cache process to run hundreds of times.
30 “Foreman” servers x 6 run’s per server for each plugin = 180 repetitive “runs” of migrate/seed/apipie cache…(while I bash my head against the wall dying of boredom)
Generally speaking my 2c (I’m not a developer, and I’m fine with this being disregarded…)
Need supported way(s) to clear large tables, reports has a rake task, as does sessions, but logs and audit don’t. Much of the seed tasks revolve around a ton of select statements around the audit table. I’d love to truncate it but i’m pretty sure that would cause issues. Same with logs and a few other tables that just seem to grow forever… the seed tasks are just slow with a giant audit table. If i could manage them - the seed jobs themselves might finish faster and speed things up significantly.
The seed and migrate rake tasks themselves should be augmented. upon success of each step (I’m thinking each migrate “step”, and each seed “step” as defined in the code currently), the step, as well as exact foreman version (1.15.6 or similar) should be recorded within the database itself within a specific (new) table within the database (only if it completes without error). Then the rake task(s) could also further be augmented to “check” the database/table prior to execution. If the specific task has already been run for the specific foreman version - skip it! This could easily allow as many RPM runs for each plugin, across many servers, as needed. The rake and migrate steps would be called, but would exit after a few seconds if everything already completed.
With the above - Install a new plugin, and it will have new migrate/seed tasks, which will still run as expected (once) and be recorded in the database for future use. Install it on multiple servers and the remaining servers “skip” the migrate/seed runs (if they did not fail).
This would then necessitate the seed and migrate rake tasks being augmented to support a “force” parameter/argument to allow manually re-running the whole shebang in case of some unforeseen issue…
This would allow you to keep the “RPM” running the rake tasks, while adding the needed intelligence to properly allow rake/migrate to run plugin specific changes, as well as across multiple servers, intelligently (and much much much quicker)…
Again, there may be factors here im not thinking of, but at the end of the day, making the database authoritative for which specific tasks need to run/rerun seems best, as it solves the multi-plugin loops as well as multi-server loops…