Hey folks! Katello has been on my mind lately - particularly some of the parts that I tended to gravitate toward (or implement entirely) when I was on the team full-time. I have some thoughts around what I think are real improvements in those areas when it comes to reducing complexity.
The component I’ve got my sights on is the Katello Event Daemon which was introduced/shared via another (admittedly after the fact) RFC several years ago. It manages a few other subsystems in Katello by spawning threads within the Puma process and ensuring they are started, running, etc. Today, it enables the operation of:
- Katello Agent and its messaging in and out of Qpid
- Candlepin Events receiving messages from Candlepin’s embedded Artemis message broker
- Katello Event Queue which is a “simple” mechanism enabling deduplication, scheduling, and retrying of certain actions
This RFC is about removing the Katello Event Daemon completely.
Since Katello Agent is finally being removed only the last two items would be managed by the Daemon. Can they be moved elsewhere to remove the additional complexity added by KED? I think so.
Candlepin Events
This is an easy one. My proposal is to add an internal API to Katello and stand up a new, (very) small service that connects to the Artemis broker within candlepin - just as it does now from inside Puma. This service (via systemd) will not handle the event beyond forwarding to internal API so it’s very lightweight.
Katello Event Queue
This is more nuanced because it has certain requirements (dedup, rescheduling, retrying) and a greater number of potential solutions.
The most basic solution would also use the internal API with its own endpoint to be called by another small external service (systemd again) which polls every few seconds. No business logic here - just a place to trigger the queue draining from. In practice this would generate a lot of log noise. Perhaps it could connect to the database to see if it should trigger the queue drain over the API to resolve the noise issue. I like this for sake of simplicity and least (zero) disturbance to the Event Queue system.
Another solution I’m fond of is reworking the handful of events that run via Event Queue into Sidekiq workers. The advantages there would be: no ‘new’ external service and actual removal of the “Katello Event Queue” construct. Out of the box, or with a plugin like sidekiq-unique-jobs all of the dedup, reschedule, and retry requirements can be addressed. I think this is a big win but I cannot speak to, for example, where Katello’s queue would be run. Sidekiq running in Dynflow? Adding a separate Sidekiq process for this would increase memory requirements which is a concern of mine.
I’ve got some of the above working to a high degree which I recently had an in-depth technical discussion on this PR which ultimately brought the focus back here.
Please share your thoughts here and let’s see if any of this can become reality.