I’m also having this issue on a fresh install of Foreman 2.4 and Katello 4.0. I’m running Katello on Oracle Linux 8 and I’ve added the Oracle Linux repos for OL6/7/8 as well as some other repos and I triggered them all to sync at the same time.
I tried syncing A large number of repositories and now I’ve got a task thats waiting for Pulp to start. Looking at the history of this post I can see that pulp had 4 workers + resource-manger running. But according to the API (https://hostname
/pulp/api/v3/workers/) there are 25 workers, 5 are active and 20 have last_heartbeats from 4+ days ago (when things died).
The task list has 1 task the hung one, which is assigned to one of the workers that is dead. Once I cancelled that task, things started moving again.
I did see entries in the logs about missing workers that match up with the IDs in my API /workers/ list.
I’ll add in all my my outputs too incase it helps someone else:
/pulp/api/v3/stats/ output
# curl "https://`hostname`/pulp/api/v3/status/" --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key | jq
{
"versions": [
{
"component": "pulpcore",
"version": "3.9.1"
},
{
"component": "pulp_rpm",
"version": "3.10.0"
},
{
"component": "pulp_file",
"version": "1.5.0"
},
{
"component": "pulp_deb",
"version": "2.9.2"
},
{
"component": "pulp_container",
"version": "2.2.2"
},
{
"component": "pulp_certguard",
"version": "1.1.0"
}
],
"online_workers": [
{
"pulp_created": "2021-06-17T23:25:41.745736Z",
"pulp_href": "/pulp/api/v3/workers/f7bd8f4c-3f14-4443-9924-2b1cabe48f19/",
"name": "1402@<servername>",
"last_heartbeat": "2021-06-21T03:43:53.646216Z"
},
{
"pulp_created": "2021-06-17T23:25:42.326468Z",
"pulp_href": "/pulp/api/v3/workers/83f67f6d-d547-4627-836f-99e69ac7b437/",
"name": "1408@<servername>",
"last_heartbeat": "2021-06-21T03:44:07.504398Z"
},
{
"pulp_created": "2021-06-17T23:25:42.933978Z",
"pulp_href": "/pulp/api/v3/workers/8e881a23-cf2b-4556-909c-a75f72f34c45/",
"name": "1401@<servername>",
"last_heartbeat": "2021-06-21T03:42:40.659272Z"
},
{
"pulp_created": "2021-06-15T10:51:04.127965Z",
"pulp_href": "/pulp/api/v3/workers/9590f209-11ae-42b0-b9a9-67d48022596b/",
"name": "resource-manager",
"last_heartbeat": "2021-06-21T03:42:40.058114Z"
},
{
"pulp_created": "2021-06-17T23:25:42.929995Z",
"pulp_href": "/pulp/api/v3/workers/9b2825df-16e3-493e-a0c7-6b0e30c431fa/",
"name": "1407@<servername>",
"last_heartbeat": "2021-06-21T03:43:54.148199Z"
}
],
"online_content_apps": [
{
"name": "1955@<servername>",
"last_heartbeat": "2021-06-21T03:44:06.263164Z"
},
{
"name": "1936@<servername>",
"last_heartbeat": "2021-06-21T03:44:08.705915Z"
},
{
"name": "1952@<servername>",
"last_heartbeat": "2021-06-21T03:44:09.400131Z"
},
{
"name": "1950@<servername>",
"last_heartbeat": "2021-06-21T03:44:09.357857Z"
},
{
"name": "1943@<servername>",
"last_heartbeat": "2021-06-21T03:44:09.414391Z"
},
{
"name": "1962@<servername>",
"last_heartbeat": "2021-06-21T03:44:04.110343Z"
},
{
"name": "1956@<servername>",
"last_heartbeat": "2021-06-21T03:44:09.356084Z"
},
{
"name": "1958@<servername>",
"last_heartbeat": "2021-06-21T03:44:04.110480Z"
},
{
"name": "1959@<servername>",
"last_heartbeat": "2021-06-21T03:44:09.411785Z"
}
],
"database_connection": {
"connected": true
},
"redis_connection": {
"connected": true
},
"storage": {
"total": 321961070592,
"used": 19971796992,
"free": 301989273600
}
}
pulp-core output
# sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' pulpcore-manager shell <<EOF
from pulpcore.app.models import Worker
workers = [w.name for w in Worker.objects.online_workers()]
for rwork in workers:
if rwork in workers:
print(f'Worker {rwork}')
EOF
Worker 1402@<servername>
Worker 1408@<servername>
Worker 1401@<servername>
Worker 1407@<servername>
Worker resource-manager
/pulp/api/v3/workers/ output
# curl https://`hostname`/pulp/api/v3/workers/ --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key | jq
{
"count": 25,
"next": null,
"previous": null,
"results": [
{
"pulp_created": "2021-06-17T23:25:42.933978Z",
"pulp_href": "/pulp/api/v3/workers/8e881a23-cf2b-4556-909c-a75f72f34c45/",
"name": "1401@<servername>",
"last_heartbeat": "2021-06-21T03:30:32.876435Z"
},
{
"pulp_created": "2021-06-17T23:25:42.929995Z",
"pulp_href": "/pulp/api/v3/workers/9b2825df-16e3-493e-a0c7-6b0e30c431fa/",
"name": "1407@<servername>",
"last_heartbeat": "2021-06-21T03:30:32.274900Z"
},
{
"pulp_created": "2021-06-17T23:25:42.326468Z",
"pulp_href": "/pulp/api/v3/workers/83f67f6d-d547-4627-836f-99e69ac7b437/",
"name": "1408@<servername>",
"last_heartbeat": "2021-06-21T03:30:45.523285Z"
},
{
"pulp_created": "2021-06-17T23:25:41.745736Z",
"pulp_href": "/pulp/api/v3/workers/f7bd8f4c-3f14-4443-9924-2b1cabe48f19/",
"name": "1402@<servername>",
"last_heartbeat": "2021-06-21T03:30:31.773271Z"
},
{
"pulp_created": "2021-06-17T04:45:53.460706Z",
"pulp_href": "/pulp/api/v3/workers/63bc74cc-ac55-42d8-8485-7e9312aef588/",
"name": "87641@<servername>",
"last_heartbeat": "2021-06-17T23:23:53.921023Z"
},
{
"pulp_created": "2021-06-17T04:45:51.516344Z",
"pulp_href": "/pulp/api/v3/workers/361e0bc2-79fd-481c-86ec-72881b090a74/",
"name": "87558@<servername>",
"last_heartbeat": "2021-06-17T23:29:02.317408Z"
},
{
"pulp_created": "2021-06-17T04:45:51.470894Z",
"pulp_href": "/pulp/api/v3/workers/f820d466-eeab-4bca-b424-f65eada5919b/",
"name": "87591@<servername>",
"last_heartbeat": "2021-06-17T23:23:53.946154Z"
},
{
"pulp_created": "2021-06-17T04:45:51.385979Z",
"pulp_href": "/pulp/api/v3/workers/6c7cb505-46f6-45cb-b364-ccdf0e8dbe9c/",
"name": "87567@<servername>",
"last_heartbeat": "2021-06-17T23:23:53.938188Z"
},
{
"pulp_created": "2021-06-16T23:29:51.450305Z",
"pulp_href": "/pulp/api/v3/workers/341351fe-2801-4bc3-b7fb-b805ef5fc0c1/",
"name": "69639@<servername>",
"last_heartbeat": "2021-06-17T04:45:50.113995Z"
},
{
"pulp_created": "2021-06-16T23:28:55.317166Z",
"pulp_href": "/pulp/api/v3/workers/46f1252d-3674-46a1-9895-cb431c28d4db/",
"name": "69501@<servername>",
"last_heartbeat": "2021-06-17T04:45:50.071885Z"
},
{
"pulp_created": "2021-06-16T23:28:54.390401Z",
"pulp_href": "/pulp/api/v3/workers/317098bb-a71f-4e5f-b400-6cbfb454d4d3/",
"name": "69495@<servername>",
"last_heartbeat": "2021-06-17T04:45:50.028776Z"
},
{
"pulp_created": "2021-06-16T23:28:45.296031Z",
"pulp_href": "/pulp/api/v3/workers/26b68f40-de97-49a8-82a5-a54d896abbe2/",
"name": "69469@<servername>",
"last_heartbeat": "2021-06-17T04:45:49.998275Z"
},
{
"pulp_created": "2021-06-15T11:51:43.238418Z",
"pulp_href": "/pulp/api/v3/workers/19fc28bd-fd42-479e-ac6c-9baa8fe43074/",
"name": "1444@<servername>",
"last_heartbeat": "2021-06-16T23:32:06.182504Z"
},
{
"pulp_created": "2021-06-15T11:51:43.116338Z",
"pulp_href": "/pulp/api/v3/workers/b1a39420-bc73-4fbb-9e3b-0d501b59748c/",
"name": "1459@<servername>",
"last_heartbeat": "2021-06-16T23:32:15.442657Z"
},
{
"pulp_created": "2021-06-15T11:51:43.073757Z",
"pulp_href": "/pulp/api/v3/workers/078fbcce-8bc4-4430-b679-a19d5bf0bb34/",
"name": "1451@<servername>",
"last_heartbeat": "2021-06-16T23:32:15.170233Z"
},
{
"pulp_created": "2021-06-15T11:51:42.988822Z",
"pulp_href": "/pulp/api/v3/workers/743bef47-7d2b-4987-af60-2e83d0807cb7/",
"name": "1448@<servername>",
"last_heartbeat": "2021-06-16T23:33:12.355166Z"
},
{
"pulp_created": "2021-06-15T10:55:59.257146Z",
"pulp_href": "/pulp/api/v3/workers/b21e0ec0-9f59-447f-a75f-dbba5faa1199/",
"name": "25247@<servername>",
"last_heartbeat": "2021-06-15T11:51:42.895031Z"
},
{
"pulp_created": "2021-06-15T10:55:57.836954Z",
"pulp_href": "/pulp/api/v3/workers/fb0965e3-f1e6-4e62-8366-d571f711c49a/",
"name": "25240@<servername>",
"last_heartbeat": "2021-06-15T11:22:08.912543Z"
},
{
"pulp_created": "2021-06-15T10:55:56.111694Z",
"pulp_href": "/pulp/api/v3/workers/11576ae6-8ed0-4fc7-88fa-b39dcc064694/",
"name": "25230@<servername>",
"last_heartbeat": "2021-06-15T11:51:42.875015Z"
},
{
"pulp_created": "2021-06-15T10:55:54.378224Z",
"pulp_href": "/pulp/api/v3/workers/bacb1c40-620e-4b17-8097-b19b1534d1ca/",
"name": "25225@<servername>",
"last_heartbeat": "2021-06-15T11:22:08.854539Z"
},
{
"pulp_created": "2021-06-15T10:51:04.127965Z",
"pulp_href": "/pulp/api/v3/workers/9590f209-11ae-42b0-b9a9-67d48022596b/",
"name": "resource-manager",
"last_heartbeat": "2021-06-21T03:30:39.100613Z"
},
{
"pulp_created": "2021-06-15T10:50:51.726137Z",
"pulp_href": "/pulp/api/v3/workers/1fe78503-8037-47c5-adfc-cf2343d129fb/",
"name": "22859@<servername>",
"last_heartbeat": "2021-06-15T10:59:15.207717Z"
},
{
"pulp_created": "2021-06-15T10:50:50.228626Z",
"pulp_href": "/pulp/api/v3/workers/22e44eb6-c2c5-4436-b7c0-ad155cacf37f/",
"name": "22770@<servername>",
"last_heartbeat": "2021-06-15T10:59:15.193882Z"
},
{
"pulp_created": "2021-06-15T10:50:48.372229Z",
"pulp_href": "/pulp/api/v3/workers/5530928d-8df7-424c-a50a-89c2c2bdd3ac/",
"name": "22684@<servername>",
"last_heartbeat": "2021-06-15T10:59:15.177090Z"
},
{
"pulp_created": "2021-06-15T10:50:46.895867Z",
"pulp_href": "/pulp/api/v3/workers/5d7ff8ba-1d17-49e9-9ff7-efc4653eed6e/",
"name": "22599@<servername>",
"last_heartbeat": "2021-06-15T10:59:15.166665Z"
}
]
}
pulp task list --state waiting output
# pulp task list --state waiting
[
{
"pulp_href": "/pulp/api/v3/tasks/c4352eca-cf08-47ff-aa84-4c03b6c7cd48/",
"pulp_created": "2021-06-17T23:32:31.291011Z",
"state": "waiting",
"name": "pulp_rpm.app.tasks.publishing.publish",
"logging_cid": "45bdee2ce16448b394addf08d079b3d7",
"started_at": null,
"finished_at": null,
"error": null,
"worker": "/pulp/api/v3/workers/361e0bc2-79fd-481c-86ec-72881b090a74/",
"parent_task": null,
"child_tasks": [],
"task_group": null,
"progress_reports": [],
"created_resources": [],
"reserved_resources_record": [
"/pulp/api/v3/repositories/rpm/rpm/2749521d-75bf-4e85-adfc-e6d1fd497606/"
]
}
]
“worker named X is missing” output
# grep "is missing" messages-20210620 2:02PM
Jun 15 20:59:15 server pulpcore-worker-1[25225]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22599@<servername> is missing. Canceling the tasks in its queue.
Jun 15 20:59:15 server pulpcore-worker-1[25225]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22684@<servername> is missing. Canceling the tasks in its queue.
Jun 15 20:59:15 server pulpcore-worker-1[25225]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22770@<servername> is missing. Canceling the tasks in its queue.
Jun 15 20:59:15 server pulpcore-worker-1[25225]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 22859@<servername> is missing. Canceling the tasks in its queue.
Jun 15 21:51:42 server pulpcore-resource-manager[1447]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 25230@<servername> is missing. Canceling the tasks in its queue.
Jun 15 21:51:42 server pulpcore-resource-manager[1447]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 25247@<servername> is missing. Canceling the tasks in its queue.
Jun 17 09:32:05 server pulpcore-worker-4[69469]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 1444@<servername> is missing. Canceling the tasks in its queue.
Jun 17 09:32:14 server pulpcore-worker-2[69495]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 1451@<servername> is missing. Canceling the tasks in its queue.
Jun 17 09:32:15 server pulpcore-worker-2[69495]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 1459@<servername> is missing. Canceling the tasks in its queue.
Jun 17 09:33:11 server pulpcore-worker-1[69639]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 1448@<servername> is missing. Canceling the tasks in its queue.
Jun 18 09:29:02 server pulpcore-worker-2[1402]: pulp [None]: pulpcore.tasking.services.worker_watcher:ERROR: The worker named 87558@<servername> is missing. Canceling the tasks in its queue.