Refresh RollingCV Repo task hangs, causing repos to not up

Problem:

Sometimes task “Refresh RollingCV Repo repository…” hangs, causing it to be in state: planned, with progress at 100%, and result pending, example here:

This causes the repositories to not synchronize when running scheduled jobs at night:

I’ve noticed this behavior on Foreman 3.17 + Katello, and the issue is still present on Foreman 3.18.1 + Katello

In Foreman GUI it says there are no tasks running or paused, but I’ve noticed that the “Refresh RollingCV Repo repository…” runs multiple times, and only one instance gets stuck, other finish in couple of seconds without errors. It happens on RollingCVs with both custom and/or Red Hat repositories.

Performing restart using command "foreman-maintain service restart” causes all those stuck tasks to finish with error, so technically it “cleans” the queue.

If I remember correctly this issue was not present at Foreman 3.16, or at least I was not aware of it (on that version we’ve had completely different problems where the repos wouldnt sync at all :D).

Expected outcome:

Finish the task “Refresh RollingCV Repo repository…” without getting stuck

Foreman and Proxy versions:

Foreman 3.18.1 + Katello 4.20

Foreman and Proxy plugin versions:

Distribution and version:

RHEL 9.7

Other relevant data:

Is it always the same Repo refresh that is stuck? Can you check the Dynflow Console button on the paused task and add a screenshot to see which step in the task gets stuck.

Hello

according to Dynflow Console, the task is in state stopped/success.

I’ve double checked just to make sure I’ve clicked the right task, and yes I did, so that means Dynflow thinks is done, but Foreman says its not.

I currently have 2 out of 5 tasks in a stuck state (5 tasks which have the same name, but different ID), here is the screenshot of the task from which I’ve made a dynflow screenshot.

Try switching over from the “Run” to the “Finalize” tab, it is possible the failed/unfinished action is hiding there.

I am afraid this wont be helpful :confused: There is only one entry, and that says success.

And just in case, here it is unwrapped

And also a response to previous question I forgot to answer: I am not sure if it is specific set of repositories where this happens, or it is “random”, because sometimes it happed on one repo more than once, and some repos got stuck only once so far. For example repo for Postgres happened only once, but RHEL/EPEL repos got stuck multiple times, its just always a different version.

I am speculating, but based on the symptoms it sounds like everything that is supposed to happen happens, except that the task does not jump to finished success. I have some distant memory of having observed something similar with tasks that essentially performed a noop. I believe we fixed that by pulling the check if the task needs to do something outside of the task and only planning it if something needs doing. That way the noop tasks that were getting stuck never got planned in the first place. I wonder if this might be a similar case.

So you think this might be a bug in Katello or something? Or maybe is there anything else that could help us provide the information what can be wrong? I was not installing the Foreman, but it should be a standard installation without anything special, only non-standard thing is that we are accessing the internet through corporate proxy, we’ve had other problems with it before but we handled them all on proxy side so far.

I think (low confidence) that it might be a race condition in the tasking system where successfully completed tasks don’t update there state to “completed success” for some reason.

I can think of two things you could investigate that could narrow things down:

  1. Do you know what action you take triggers the affected tasks? Is it a sync, or something else? This might help us narrow down the codepath that was taken.
  2. Are your rolling CV repositories updated even though the relevant task hangs or not? As in you sync new content, does that new content appear in the rolling CV repo even though the task hangs? This would tell us if the action itself is broken, or if it is (in my view more likely) the tasking system. As in all Task actions complete successfully and correctly, but for some reason the task is not updated to “completed success”.

We have a sync plan that happens every night at 01:00 to sync all repositories.

As about the second question, I wasnt sure on how to check this. I’ve checked the repositories inside the RollingCV, and the sync state says the repo got synced:

This is the repo on which the task hanged this night.

Then I’ve checked the errata, there was an advisory about Java from yesterday, so I’ve checked the system connected to the affected CV which java package is available to it, and the newest one was a version mentioned in security advisory from Red Hat, so that should mean the RollingCV itself is synced and updated. The only thing is that there are always more than 1 task to Refresh RollingCV Repo repository (usually 4 or 5 tasks), and I dont know if that could affect something - if every task updates a part of repository, or what is the meaning behind multiple tasks.