RFC: The Future of Periodic Tasks
Background
There are certain routine actions which are required to be done periodically in order to make Foreman work as expected. These are currently implemented with a mix of external cron jobs triggering rake tasks and two flavours of in-application periodic tasks. While this model worked reasonably well in the past, the move to container-based deployments offers us an opportunity to rethink things.
Note: When this text mentions “task” it is to be interpreted as a more general and abstract thing to do, it is not meant to be read neither as a rake task nor as an instance of ForemanTasks::Task.
Current State of Periodic Task Scheduling
Currently, periodic tasks could be divided into two main categories - Externally Scheduled and Internally Scheduled.
Externally Scheduled Tasks
The externally scheduled tasks are implemented in two different ways. The older, more widely used approach relies on cron, while the newer approach leverages systemd timers.
Flavors of External Scheduling
- Plain Old Cron (and
foreman-rake):- Many traditional tasks are executed via
cronusing theforeman-rakewrapper. This includes core tasks likedb:sessions:clear, reporting tasks (reports:{daily,weekly,monthly}), various plugin tasks (foreman_tasks:cleanup) as well as some coming from smart proxy plugins (rubygem-smart_proxy_openscap).
- Many traditional tasks are executed via
- Systemd Timers:
- A more modern alternative, currently only used in Satellite deployments
Properties of External Scheduling
- Individual tasks are completely independent - a failure in one does not affect others in any way.
- Users can freely modify, disable, or reschedule individual tasks.
- Tasks can be easily executed on demand by the user.
- Can execute anything, even processes not included within the Rails application.
- Rake tasks are slow to execute (adding at least 20 seconds) because they require loading the entire application environment before running. However, all memory is released upon exit.
Internally Scheduled Tasks
These tasks are managed directly within the application, offering better visibility and less resource usage.
Flavors of Internal Scheduling
Same as the externally scheduled tasks, internally scheduled ones come in two flavours. Part is implemented using Recurring Logics from foreman-tasks, the other part relies on chaining ActiveJob jobs in a more ad-hoc fashion, even though both end up using Dynflow to schedule things to be executed in the future.
Foreman tasks Recurring Logics (RLs)
Dynflow has support for scheduling one-off things to be executed in the future. Foreman tasks build on top of that by providing several constructs that allow Dynflow actions to be executed periodically. Recurring logics store the configuration (ie. how often?) as well as state.
A RecurringAction module needs to be included in the root action class. The RL acts as a persistent store for scheduling metadata (cronline, iteration count, limits). The scheduling of the next iteration relies on execution plan hooks in the underlying Dynflow engine, tying each iteration to a specific task group tied to the relevant RL.
Several system periodic tasks (for example Red Hat Lightspeed client status aging, Check for long running tasks) are implemented this way. At the same time, recurring logics can be used by users directly to manage Sync Plans and recurring remote execution jobs.
Ad-hoc using Chained Active Jobs (System only):
The ActiveJob (AJ) is registered in an initializer and, upon completion, schedules its next iteration itself. There is no explicit link between iterations. This is the primary tool in environments where foreman-tasks is not available (ie. in vanilla Foreman).
This is used for various notification and cleanup jobs such as Host lifecycle support expiration notification, Clean up StoredValues, and manifest expiration warnings.
The downside is that this kind of periodic tasks cannot be configured in any meaningful way by the users.
Properties of Internal Scheduling
- Relatively cheap to create and execute compared to the overhead of external
foreman-raketasks. - The status and individual invocations leave a paper trail in the form of tasks (RL only) and logs (both cases).
- No clear, system-wide distinction between system-level and user-configured tasks.
- Fragility
- RL only: If the scheduled “next iteration” is cancelled or fails to schedule, the recurrence can be broken for good.
- AJ only: If the scheduled “next iteration” is cancelled or fails to schedule, the recurrence can be broken until the application is restarted.
- AJ only: Active Job-based recurrences do not account for the task’s run time, which can lead to drift if the task takes a long time to execute.
- Because the scheduled tasks are executed by the same background processing engine as any other tasks, relying on internal scheduling doesn’t bring in any extra dependencies, but can make periodic tasks compete for worker slots with other, possibly non-periodic tasks.
Post redmine 38956 Era (The Near Future)
To avoid the need for having a cron container in the container-based deployments, the short term decision is to leverage systemd-timers on the container host. To avoid having to define all the periodic tasks as individual timers, there will be four anchor rake tasks based on cadence (hourly, daily, weekly, monthly) and individual rake tasks will attach to them. Systemd timers will be provided for those four anchor rake tasks.
This will reduce the number of timers from ~13 to 4, the runtime should be reduced by not having to load the application environment individually for each task while still keeping the ability to run the individual tasks on demand.
This comes at the cost of reduced isolation, as a misbehaving task has the potential to block or even completely prevent others within the same group from running.
Open Questions
- How will completely external processes, such as
foreman-reportsandsmart-proxyscripts, integrate into this consolidated rake task structure?
Proposal
The ultimate goal is to bring all periodical task management directly into the application and settle on a single way of achieving it. To accomplish this we would need to:
- Implement all periodical tasks within the application as action classes.
- Convert all Active Job-chaining based tasks to rely on Foreman-Tasks Recurring Logics
- Note: This might imply merging
foreman-tasksor its selected subset into Foreman core or having Foreman core depend onforeman-tasks.
- Note: This might imply merging
- Periodic tasks should be migrated from the older approaches to the new one, it is not desirable to maintain two ways of achieving the same thing.
Required Changes in Foreman-Tasks
To support this unification, the foreman-tasks framework needs several enhancements to achieve feature parity with current approaches and to generally improve the user experience:
- Add the ability to edit (change end date and limit, change the interval) of Recurring Logics.
- Note: This is currently available for Sync Plans in a rather workaround-y way
- Implement a mechanism to trigger a task defined as an RL on demand without affecting its overall scheduling cycle.
- Add the ability to mark RLs and their associated tasks as “system” tasks in the code, allowing users to distinguish them from user RLs or filter them out altogether.
- Note: In the past we’ve had a similar request for individual tasks SAT-21985
- Ensure that when configuring an RL, the exact configuration details are preserved. Currently we convert the user-friendly configuration (ie. “Run daily”) to a cronline and only store that.
- Reworked Recurrence Mechanism:
- The current mechanism, where the “next iteration” is the source of truth, is too fragile, as cancelling the next iteration breaks the entire recurrence.
- One possible solution would be to Introduce a single delayed plan (in Dynflow’s terminology) to act as a template. Individual iterations would be cloned from this template. This would also allow for a periodic check to ensure all RLs always have their next iteration correctly planned as well as being able to have it run on demand without affecting the schedule.
Optional Enhancements
- Recurring logic grouping: The short term solution groups the periodic tasks into categories. If users get used to this, it wouldn’t be ideal to remove it a couple of releases later.
- Splay Time: Add the ability to configure a random “splay” (or offset) time for Recurring Logics, which is useful for tasks that should run across a time window rather than all at the exact same minute (e.g., syncing with cloud services).
By moving towards a unified, in-application model, we distance the application more from the underlying operating system, allowing the administrators to manage a larger part of the application from within the application itself as well as reducing environmental overhead.
Open questions
- What is the documentation impact? How are current periodic tasks documented, if at all?