Apologies, apparently i jumped the gun a bit and had to lock down the repository while getting 1 last go ahead from my boss to allow me to share the program. Will update the post as soon as it’s available again.
That being said, it is written so that you run a schedule-creator that iterates through foreman hosts and creates a cron-file with a list for each host for a given month, each host being given its own time to update/reboot.
By default the time params can be set to ‘random’ and the scheduler will create an entry in its config file with a random weekday, week, hour and minute ( for consistency in reboot times each month ).
If multiple hosts in a group happen to hit the same date/time it will offset it by 15 minutes until there isnt a collision within the group
When the cronjob triggers for a given host the program starts by getting some info and checks a monitoring system ( currently only check_mk implented ) if the host in question and its entire group have any checks in Critical state, of one is found the process spits out an error and calls it quits for that job ( this was decided internally so the update/reboot cant completely cripple an already compromised service )
If everything checks out, a downtime is created in the monitoring system ( for 30 minutes by default ) and starts updating and reboots after they finish, able to run pre- and post-tasks for both update and reboot if required ( both update and reboot can be disabled/enabled seperately on a per-host basis ), when the reboot job finishes the downtime gets removed and if everything went smoothly, no one notices anything ( the best kind of automation )
There are still some things that i’m fixing and making better but all in all, it seems to be qorking fairly well, been running it for all non-production hosts since september, mostly stable since then, after a few more fixes we intend to let this thing loose on all machines in foreman ( currently at around 190-ish and constantly growing )