Problem:
I accidentally deleted the long-running Katello tasks (Monitor event Queue and Listen on Candlepin Events) and am unable to get them back. I tried restarting foreman-tasks, restarting Foreman and rebooting the system to no avail.
I know the tasks are recreated under certain conditions, but I’m unsure which conditions are checked.
Is there a way to force recreation or can I safely just wait for them to reappear?
Expected outcome:
The tasks return
Foreman and Proxy versions:
1.20.3 (I know this version is dusty)
thanks for your reply. That was actually the first thing I tried. We also do a stop and start of Foreman every night for an offline backup. Yet, the tasks have not reappeared by now. I can try scheduling an additional restart after business hours today if needed, though.
How did you delete the tasks? That might point to how to revive them. Something to try is navigating to /foreman_tasks/dynflow/worlds and clicking check and invalidate then for good measure systemctl restart dynflowd.
After, see if you have an execution plan for those tasks under /foreman_tasks/dynflow/
Thanks for the tips, will try that today. As a infoupdate beforehand: I messed up the search expression for foreman-rake foreman_tasks:cleanup and accidentally killed all tasks except the ones I wanted m(
I will report back once I had a chance to check you suggestions.
I tried your suggestions, but to no result. check and invalidate returned all worlds as valid. Restarting dynflowd did not change a thing (I also added a restart of foreman-tasks after I saw nothing changed just to be sure).
In dynflow, I can see a lot of scheduled tasks with the lable “Dynflow::ActiveJob::QueueAdapters::JobWrapper”, but I have no idea what those might be. Besides that, there are only Repo Sync Jobs scheduled (and one that is running according to dynflow, but I cannot see that from the Foreman UI).
I learned that there’s a bug in the version of dynflow packaged with 3.9 which will prevent those tasks from recovering. The good news is that upgrading to 3.10 should bring them back.
You should know that these tasks are moved out of Dynflow in Katello 3.14 and this would never be a problem in that release. If you’re interested in going to 3.14 just remember to not skip any versions.
Thanks for the infos. Sad news for me.
We are trying to upgrade to 1.22/3.12 for quite some time now and the update to 3.14 would be long in the works. Sadly, our organization finds new reasons to deny that change every time we schedule a date for that upgrade…
What are the exact implications of those tasks not running? I assume they are important for something, but could never find out.
Those tasks deal with processing messages that come from the Candlepin backend to keep Katello in sync as well as handling things like errata applicability calculation. They are vital to the function of Katello.
Sorry to hear that upgrading is not exactly feasible. I’ll try to track down some steps for recovery.
A bit of background. The tasking engine uses a special table called dynflow_coordinator_records to ensure certain things are run only once, such as the ListenOnCandlepinEvents and EventQueue::Monitor tasks. Sadly there was a bug in the version that is shipped with katello 3.9 which made it possible for records in that table to become orphaned and block the tasks from spawning again.
Before we try to prune any records, could you check something for me? I’d need you to:
Stop dynflowd service
Log into the database using psql
Run select * from dynflow_coordinator_records where id like 'singleton-action:%';
Start dynflowd service again
If there will be any records after running step 3 then we will most likely need to clean them up, if not then it is a different issue and we’ll need to figure out what’s going on.
I can confirm there are records returned from step 3:
foreman=# select * from dynflow_coordinator_records where id like 'singleton-action:%';
id | class | owner_id |
data
--------------------------------------------------------------+-------------------------------------------+-----------------------------------------------------+---------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
singleton-action:Actions::Candlepin::ListenOnCandlepinEvents | Dynflow::Coordinator::SingletonActionLock | execution-plan:b0b5bbb4-0f59-4c0d-94f4-db7fc423adce | {"class":"Dynflow::Coordinator::SingletonAct
ionLock","owner_id":"execution-plan:b0b5bbb4-0f59-4c0d-94f4-db7fc423adce","execution_plan_id":"b0b5bbb4-0f59-4c0d-94f4-db7fc423adce","id":"singleton-action:Actions::Candlepin::ListenOnCandlepinEvents"}
singleton-action:Actions::Katello::EventQueue::Monitor | Dynflow::Coordinator::SingletonActionLock | execution-plan:046d677c-129b-464e-8314-3761dddde666 | {"class":"Dynflow::Coordinator::SingletonAct
ionLock","owner_id":"execution-plan:046d677c-129b-464e-8314-3761dddde666","execution_plan_id":"046d677c-129b-464e-8314-3761dddde666","id":"singleton-action:Actions::Katello::EventQueue::Monitor"}
That’s what I thought. Now if we remove those, the tasks should show up again. The steps are the same, just replace select * with delete and you should be good to go. And we already have a backup in case anything goes south.
Thanks for your help @aruzicka,@Jonathon_Turel and @John_Mitsch
The tasks reappeared, everything seems to be working again
We will still continue to push for the updates to happen as soon as possible. Cannot wait for those tasks to disappear in 3.14