Pulp tasks stuck as Waiting, Unstarted, Incomplete

Problem:
The short version: We see Pulp tasks (sync and deferred_download) stuck as Waiting, Unstarted, Incomplete which do not generate different log messages than those running correct and I ran out of ideas for further troubleshooting.

The long version: When I arrived yesterday at the costumer I was told tasks for Repository synchronization are not running. So I started by canceling the job which held lock and afterwards other jobs started but alos got stuck and I reached the point where some jobs were not stopping when I tried to cancel them. All of them being Pulp tasks I had a look into the task list via pulp-admin which showed more than 2400 jobs waiting but canceling them via pulp-admin did not move the tasks forward in Foreman. So I canceled them via foreman-rake console like described in https://access.redhat.com/solutions/1554783, also they were not paused but running. Everything seemed fine afterwards but then I recognized again a tasks hanging when Sync plan started Repository sync. I found a knowledge base entry which told me that a restart of the pulp services is required after cancelling jobs via foreman-rake console. Restarting the services brought back again more than 2000 tasks in Pulp but most of them where executed and ended successful and those stuck were gone after restarting the services again. But again with the next sync plan I got new pulp tasks are stuck also sometimes some of the type deferred_download which is not used here and the feature is not prepared at all. Log entries look similar for tasks executed successfully and stuck, only difference I recognized is those stuck report success faster. Tasks in foreman seem fine now but we fear future problems caused by the stuck tasks. During the troubleshooting I also found qdrouterd not running because of the package upgrade which was resolved by upgrading the puppet-modules.

Expected outcome:
No stuck tasks and a reliable Repository synchronization.

Foreman and Proxy versions:
1.15.6

Foreman and Proxy plugin versions:
bastion 5.1.1
foreman-tasks 0.9.6
foreman_blossom 0.0.1
foreman_bootdisk 9.0.0
foreman_dhcp_browser 0.0.7
foreman_discovery 9.1.3
foreman_dlm 0.0.1
foreman_docker 3.2.1
foreman_expire_hosts 3.0.0
foreman_gaffer_tape 0.1.0
foreman_graphql 0.0.1
foreman_host_extra_validator 0.0.4
foreman_memcache 0.0.6
foreman_monitoring 0.1.3
foreman_omaha 0.1.0
foreman_remote_execution 1.3.7
foreman_rescue 0.0.1
foreman_snapshot_management 1.3.0
foreman_spacewalk 0.0.1
foreman_templates 5.0.1
foreman_userdata 0.0.1
foreman_virt_who_configure 0.1.5
foreman_wreckingball 0.0.1
katello 3.4.5
puppetdb_foreman 3.1.2

Other relevant data:

# pulp-admin tasks list
+----------------------------------------------------------------------+
                                 Tasks
+----------------------------------------------------------------------+

Operations:  deferred_download
Resources:   
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     7b32536f-beb0-41c3-8301-50cf9b4587e4

Operations:  deferred_download
Resources:   
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     1889785c-4720-4b7a-827c-6e92d54483d3

Operations:  deferred_download
Resources:   
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     821e4a34-3287-4cb4-9754-8de2e0a5be9c

Operations:  deferred_download
Resources:   
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     6c8bab84-cacb-4ea3-a0dd-1f0023d80ed9

Operations:  deferred_download
Resources:   
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     7dc5ae11-c765-4bb8-ac12-726385f17a0e

Operations:  sync
Resources:   e6165bc9-18e4-4716-8022-284a11a3077a (repository)
State:       Waiting
Start Time:  Unstarted
Finish Time: Incomplete
Task Id:     4bae8a69-4499-4915-a344-ddef47d49c67
# grep -A15 "Sending due task download_deferred_content" /var/log/messages
Jun  6 13:10:40 infra-pulp-01 pulp: celery.beat:INFO: Scheduler: Sending due task download_deferred_content (pulp.server.controllers.repository.queue_download_deferred)
Jun  6 13:10:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.queue_download_deferred[0e1076d8-d34f-49d6-a5ea-2398ce1df776]
Jun  6 13:10:40 infra-pulp-01 pulp: qpid.messaging:WARNING: [0e1076d8] process forked, child must not use parent qpid.messaging
Jun  6 13:10:40 infra-pulp-01 pulp: kombu.transport.qpid:INFO: [0e1076d8] Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 13:10:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.download_deferred[7002d321-6d87-42e8-995d-724331e2f1e2]
Jun  6 13:10:40 infra-pulp-01 pulp: py.warnings:WARNING: [0e1076d8] (67183-33696) /usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py:552: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:10:40 infra-pulp-01 pulp: py.warnings:WARNING: [0e1076d8] (67183-33696)   TaskStatus._get_collection().update({'task_id': task_id}, update, upsert=True)
Jun  6 13:10:40 infra-pulp-01 pulp: py.warnings:WARNING: [0e1076d8] (67183-33696)
Jun  6 13:10:40 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.queue_download_deferred[0e1076d8-d34f-49d6-a5ea-2398ce1df776] succeeded in 0.0883674379438s: None
Jun  6 13:10:41 infra-pulp-01 pulp: py.warnings:WARNING: [7002d321] (67257-27360) /usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:10:41 infra-pulp-01 pulp: py.warnings:WARNING: [7002d321] (67257-27360)   upsert=upsert, **write_concern)
Jun  6 13:10:41 infra-pulp-01 pulp: py.warnings:WARNING: [7002d321] (67257-27360)
Jun  6 13:10:41 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.download_deferred[7002d321-6d87-42e8-995d-724331e2f1e2] succeeded in 1.026209109s: None
Jun  6 13:11:36 infra-pulp-01 sshd[69495]: Connection closed by 172.23.35.20 port 34876 [preauth]
Jun  6 13:12:36 infra-pulp-01 sshd[69697]: Connection closed by 172.23.35.20 port 39002 [preauth]
Jun  6 13:13:36 infra-pulp-01 sshd[69898]: Connection closed by 172.23.35.20 port 42934 [preauth]
--
Jun  6 13:20:40 infra-pulp-01 pulp: celery.beat:INFO: Scheduler: Sending due task download_deferred_content (pulp.server.controllers.repository.queue_download_deferred)
Jun  6 13:20:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.queue_download_deferred[7aa14a98-16d3-4979-801e-26d676487b34]
Jun  6 13:20:40 infra-pulp-01 pulp: qpid.messaging:WARNING: [7aa14a98] process forked, child must not use parent qpid.messaging
Jun  6 13:20:40 infra-pulp-01 pulp: kombu.transport.qpid:INFO: [7aa14a98] Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 13:20:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.download_deferred[0e88452f-8a86-4231-a914-1e04b607fbeb]
Jun  6 13:20:40 infra-pulp-01 pulp: py.warnings:WARNING: [7aa14a98] (67270-75488) /usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py:552: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:20:40 infra-pulp-01 pulp: py.warnings:WARNING: [7aa14a98] (67270-75488)   TaskStatus._get_collection().update({'task_id': task_id}, update, upsert=True)
Jun  6 13:20:40 infra-pulp-01 pulp: py.warnings:WARNING: [7aa14a98] (67270-75488)
Jun  6 13:20:40 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.queue_download_deferred[7aa14a98-16d3-4979-801e-26d676487b34] succeeded in 0.0637582479976s: None
Jun  6 13:20:40 infra-pulp-01 pulp: py.warnings:WARNING: [0e88452f] (67193-32448) /usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:20:40 infra-pulp-01 pulp: py.warnings:WARNING: [0e88452f] (67193-32448)   upsert=upsert, **write_concern)
Jun  6 13:20:40 infra-pulp-01 pulp: py.warnings:WARNING: [0e88452f] (67193-32448)
Jun  6 13:20:40 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.download_deferred[0e88452f-8a86-4231-a914-1e04b607fbeb] succeeded in 0.0421735139098s: None
Jun  6 13:20:53 infra-pulp-01 pulp: celery.worker.consumer:WARNING: (16090-19072) consumer: Connection to broker lost. Trying to re-establish the connection...
Jun  6 13:20:53 infra-pulp-01 pulp: celery.worker.consumer:WARNING: (16093-27360) consumer: Connection to broker lost. Trying to re-establish the connection...
Jun  6 13:20:53 infra-pulp-01 pulp: celery.worker.consumer:WARNING: (16090-19072) Traceback (most recent call last):
--
Jun  6 13:30:40 infra-pulp-01 pulp: celery.beat:INFO: Scheduler: Sending due task download_deferred_content (pulp.server.controllers.repository.queue_download_deferred)
Jun  6 13:30:40 infra-pulp-01 pulp: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 13:30:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.queue_download_deferred[25d6b36c-8020-49f6-9af1-d383ee8759de]
Jun  6 13:30:40 infra-pulp-01 pulp: qpid.messaging:WARNING: [25d6b36c] process forked, child must not use parent qpid.messaging
Jun  6 13:30:40 infra-pulp-01 pulp: kombu.transport.qpid:INFO: [25d6b36c] Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 13:30:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.download_deferred[2ec6dfd9-f207-458d-890e-2c0d77fe9e0b]
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [25d6b36c] (71514-75488) /usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py:552: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [25d6b36c] (71514-75488)   TaskStatus._get_collection().update({'task_id': task_id}, update, upsert=True)
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [25d6b36c] (71514-75488)
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [25d6b36c] (71514-75488) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [25d6b36c] (71514-75488)   "MongoClient opened before fork. Create MongoClient "
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [25d6b36c] (71514-75488)
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [2ec6dfd9] (71515-32448) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [2ec6dfd9] (71515-32448)   "MongoClient opened before fork. Create MongoClient "
Jun  6 13:30:40 infra-pulp-01 pulp: py.warnings:WARNING: [2ec6dfd9] (71515-32448)
Jun  6 13:30:40 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.queue_download_deferred[25d6b36c-8020-49f6-9af1-d383ee8759de] succeeded in 0.0703553811181s: None
--
Jun  6 13:40:40 infra-pulp-01 pulp: celery.beat:INFO: Scheduler: Sending due task download_deferred_content (pulp.server.controllers.repository.queue_download_deferred)
Jun  6 13:40:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.queue_download_deferred[6cf81064-b373-4a46-aa27-d633e541d8be]
Jun  6 13:40:40 infra-pulp-01 pulp: qpid.messaging:WARNING: [6cf81064] process forked, child must not use parent qpid.messaging
Jun  6 13:40:40 infra-pulp-01 pulp: kombu.transport.qpid:INFO: [6cf81064] Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 13:40:40 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.download_deferred[8dc78c50-58a4-48d3-a02f-2258b738ba0e]
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [6cf81064] (67179-19072) /usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py:552: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [6cf81064] (67179-19072)   TaskStatus._get_collection().update({'task_id': task_id}, update, upsert=True)
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [6cf81064] (67179-19072)
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [6cf81064] (67179-19072) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [6cf81064] (67179-19072)   "MongoClient opened before fork. Create MongoClient "
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [6cf81064] (67179-19072)
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [8dc78c50] (69249-33696) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [8dc78c50] (69249-33696)   "MongoClient opened before fork. Create MongoClient "
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [8dc78c50] (69249-33696)
Jun  6 13:40:40 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.queue_download_deferred[6cf81064-b373-4a46-aa27-d633e541d8be] succeeded in 0.0736988519784s: None
Jun  6 13:40:40 infra-pulp-01 pulp: py.warnings:WARNING: [8dc78c50] (69249-33696) /usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
--
Jun  6 13:52:53 infra-pulp-01 pulp: celery.beat:INFO: Scheduler: Sending due task download_deferred_content (pulp.server.controllers.repository.queue_download_deferred)
Jun  6 13:52:53 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.queue_download_deferred[92ce9b49-0326-4f19-b896-d0151719611b]
Jun  6 13:52:53 infra-pulp-01 pulp: kombu.transport.qpid:INFO: [92ce9b49] Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 13:52:53 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.download_deferred[5f12b182-089d-4a78-92f5-21c5992a8124]
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [92ce9b49] (77278-93792) /usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py:552: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [92ce9b49] (77278-93792)   TaskStatus._get_collection().update({'task_id': task_id}, update, upsert=True)
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [92ce9b49] (77278-93792)
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [92ce9b49] (77278-93792) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [92ce9b49] (77278-93792)   "MongoClient opened before fork. Create MongoClient "
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [92ce9b49] (77278-93792)
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [5f12b182] (77283-56000) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [5f12b182] (77283-56000)   "MongoClient opened before fork. Create MongoClient "
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [5f12b182] (77283-56000)
Jun  6 13:52:53 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.queue_download_deferred[92ce9b49-0326-4f19-b896-d0151719611b] succeeded in 0.0618985157926s: None
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [5f12b182] (77283-56000) /usr/lib/python2.7/site-packages/mongoengine/queryset/base.py:461: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 13:52:53 infra-pulp-01 pulp: py.warnings:WARNING: [5f12b182] (77283-56000)   upsert=upsert, **write_concern)
--
Jun  6 14:10:45 infra-pulp-01 pulp: celery.beat:INFO: Scheduler: Sending due task download_deferred_content (pulp.server.controllers.repository.queue_download_deferred)
Jun  6 14:10:45 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.queue_download_deferred[65bf9ffb-4ee2-481b-8d78-2a73056b1f4a]
Jun  6 14:10:45 infra-pulp-01 pulp: qpid.messaging:WARNING: [65bf9ffb] process forked, child must not use parent qpid.messaging
Jun  6 14:10:45 infra-pulp-01 pulp: kombu.transport.qpid:INFO: [65bf9ffb] Connected to qpid with SASL mechanism ANONYMOUS
Jun  6 14:10:45 infra-pulp-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.controllers.repository.download_deferred[604f1b7b-998e-4c72-b1b4-d7e3b5ab360a]
Jun  6 14:10:45 infra-pulp-01 pulp: py.warnings:WARNING: [65bf9ffb] (82262-93792) /usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py:552: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 14:10:45 infra-pulp-01 pulp: py.warnings:WARNING: [65bf9ffb] (82262-93792)   TaskStatus._get_collection().update({'task_id': task_id}, update, upsert=True)
Jun  6 14:10:45 infra-pulp-01 pulp: py.warnings:WARNING: [65bf9ffb] (82262-93792)
Jun  6 14:10:45 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.queue_download_deferred[65bf9ffb-4ee2-481b-8d78-2a73056b1f4a] succeeded in 0.0588236940093s: None
Jun  6 14:10:46 infra-pulp-01 pulp: py.warnings:WARNING: [604f1b7b] (82225-56000) /usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead.
Jun  6 14:10:46 infra-pulp-01 pulp: py.warnings:WARNING: [604f1b7b] (82225-56000)   upsert=upsert, **write_concern)
Jun  6 14:10:46 infra-pulp-01 pulp: py.warnings:WARNING: [604f1b7b] (82225-56000)
Jun  6 14:10:46 infra-pulp-01 pulp: celery.worker.job:INFO: Task pulp.server.controllers.repository.download_deferred[604f1b7b-998e-4c72-b1b4-d7e3b5ab360a] succeeded in 1.02632880304s: None
Jun  6 14:11:30 infra-pulp-01 sshd[84721]: Connection closed by 172.23.35.20 port 60078 [preauth]

I found yesterday in the evening the following bug report filed against pulp which is already fixed in newer versions:
https://pulp.plan.io/issues/2979

So I commented the line in /etc/default/pulp_workers:

#PULP_MAX_TASKS_PER_CHILD=2

After canceling the stuck tasks and restart the services it now runs fine for 15 hours without tasks getting stuck. I will now change the puppet code to keep this setting and observe it for a longer time, but I hope this is really the fix.

From the pulp discussion it sounds, that https://github.com/pulp/pulp/pull/3177 should fix the issue. It’s part of 2.14 and 2.15 - and should be therefore part of Katello 3.6