Foreman-Katello not running repo syncs

Hi All,
I've recently noticed that our FK server has not been running any repo
sync-jobs for the last few months. I've spent quite a while attempting to
debug this without success, but recently I think I may have stumbled upon
the issue and wanted to check if there is any way to fix it without
removing and re-adding all the repo-sync schedules. Manually running
any/all of the syncs is successful every time.

I ran -

"grep pulp /var/log/messages | grep repository". This produced
(summarised) -

Aug 1 01:05:51 XXXXX pulp: celery.beat:INFO: Scheduler: Sending due task
download_deferred_content
(pulp.server.controllers.repository.queue_download_deferred)
Aug 1 01:05:51 XXXXX pulp: celery.worker.strategy:INFO: Received task:
pulp.server.controllers.repository.queue_download_deferred[d49dc059-cd10-4f59-b066-eb1516f12a8a]
Aug 1 01:05:51 XXXXX pulp: celery.worker.strategy:INFO: Received task:
pulp.server.controllers.repository.download_deferred[f24f4c50-1a3d-4480-8a58-0ad9998ab8f1]
Aug 1 01:05:51 XXXXX pulp: celery.worker.job:INFO: Task
pulp.server.controllers.repository.queue_download_deferred[d49dc059-cd10-4f59-b066-eb1516f12a8a]
succeeded in 0.0414881289471s: None
Aug 1 01:05:51 XXXXX pulp: celery.worker.job:INFO: Task
pulp.server.controllers.repository.download_deferred[f24f4c50-1a3d-4480-8a58-0ad9998ab8f1]
succeeded in 0.0459181109909s: None
[snip]

  • This suggests to me that the syncs should run but are deferred for
    whatever reason. So I added -

task "list_active_sync_schedules" to /usr/share/foreman/Rakefile as per
Sync Schedule Removal · GitHub and ran it.

It produced some interesting output. I've added the full output as an
attachment, but to summarise, running -

"awk -F, '{print $4","$5","$10","$11}'
foreman-rake_list_active_sync_schedules.txt"

produced…

"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"
"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"
"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"

…and so on for each repo.

This appears to suggest that none of the syncs will run again until after
30th November 2020!

I checked the date/time on the server and it is correct. A check on shell
histories on the system to see if anyone had previously altered the system
date revealed nothing. I know there were movements to add an NTP appliance
to the site around that time that could have potentially caused the system
date to jump into the future, but cant confirm that.

So my question is, "is there any way I can reset the "last_run_at" field"
in FK to allow the syncs to resume without removing and re-adding them"?

I have the output of a "foreman-debug" that I can attach if requested,

Thanks in advance for any/all information.

Regards,

Craig.

System Information:
Foreman 1.11.4

Installed Packages

  • candlepin-0.9.54.6-1.el7.noarch
  • candlepin-selinux-0.9.54.6-1.el7.noarch
  • foreman-installer-katello-3.0.2-1.el7.noarch
  • katello-3.0.0-9.el7.noarch
  • katello-agent-2.7.0-1.el7.noarch
  • katello-agent-fact-plugin-2.7.0-1.el7.noarch
  • katello-ca-consumer-lmorea4vapp01.ecpplatform.net-1.0-1.noarch
  • katello-certs-tools-2.4.0-1.el7.noarch
  • katello-client-bootstrap-1.1.0-1.el7.noarch
  • katello-common-3.0.0-9.el7.noarch
  • katello-debug-3.0.0-9.el7.noarch
  • katello-default-ca-1.0-1.noarch
  • katello-installer-base-3.0.2-1.el7.noarch
  • katello-repos-3.0.0-3.el7.noarch
  • katello-selinux-3.0.1-1.el7.noarch
  • katello-server-ca-1.0-1.noarch
  • katello-service-3.0.0-9.el7.noarch
  • libqpid-dispatch-0.4-11.el7.x86_64
  • XXXXXt-qpid-broker-1.0-1.noarch
  • XXXXX-qpid-client-cert-1.0-1.noarch
  • XXXXX-qpid-router-client-1.0-1.noarch
  • XXXXX-qpid-router-server-1.0-1.noarch
  • pulp-admin-client-2.8.6-1.el7.noarch
  • pulp-client-1.0-1.noarch
  • pulp-docker-plugins-2.0.2-1.el7.noarch
  • pulp-katello-1.0-1.el7.noarch
  • pulp-puppet-plugins-2.8.6-1.el7.noarch
  • pulp-puppet-tools-2.8.6-1.el7.noarch
  • pulp-rpm-admin-extensions-2.8.6-1.el7.noarch
  • pulp-rpm-handlers-2.8.6-1.el7.noarch
  • pulp-rpm-plugins-2.8.6-1.el7.noarch
  • pulp-selinux-2.8.6-1.el7.noarch
  • pulp-server-2.8.6-1.el7.noarch
  • python-gofer-qpid-2.7.6-1.el7.noarch
  • python-isodate-0.5.0-4.pulp.el7.noarch
  • python-kombu-3.0.33-5.pulp.el7.noarch
  • python-pulp-agent-lib-2.8.6-1.el7.noarch
  • python-pulp-bindings-2.8.6-1.el7.noarch
  • python-pulp-client-lib-2.8.6-1.el7.noarch
  • python-pulp-common-2.8.6-1.el7.noarch
  • python-pulp-docker-common-2.0.2-1.el7.noarch
  • python-pulp-oid_validation-2.8.6-1.el7.noarch
  • python-pulp-puppet-common-2.8.6-1.el7.noarch
  • python-pulp-repoauth-2.8.6-1.el7.noarch
  • python-pulp-rpm-common-2.8.6-1.el7.noarch
  • python-pulp-streamer-2.8.6-1.el7.noarch
  • python-qpid-0.30-9.el7.noarch
  • python-qpid-proton-0.9-13.el7.x86_64
  • python-qpid-qmf-0.30-5.el7.x86_64
  • qpid-cpp-client-0.30-8.el7.x86_64
  • qpid-cpp-client-devel-0.30-8.el7.x86_64
  • qpid-cpp-server-0.30-8.el7.x86_64
  • qpid-cpp-server-linearstore-0.30-8.el7.x86_64
  • qpid-dispatch-router-0.4-11.el7.x86_64
  • qpid-proton-c-0.9-13.el7.x86_64
  • qpid-qmf-0.30-5.el7.x86_64
  • qpid-tools-0.30-4.el7.noarch
  • rubygem-smart_proxy_pulp-1.3.0-1.el7.noarch
  • tfm-rubygem-hammer_cli_katello-0.0.26-1.el7.noarch
  • tfm-rubygem-katello-3.0.2-1.el7.noarch
  • tfm-rubygem-qpid_messaging-0.30.0-7.el7.x86_64

foreman-rake_list_active_sync_schedules.txt (446 Bytes)

··· =================
Hey Craig,

Did you end up coming across a solution for this? We have run into similar issues with our Katello server, and are struggling to resolve (or even really locate where) the problem(s).

Cheers.


··· On Thursday, August 3, 2017 at 1:41:33 AM UTC+12, craig.c...@gmail.com wrote:

Hi All,
I've recently noticed that our FK server has not been running any repo
sync-jobs for the last few months. I've spent quite a while attempting to
debug this without success, but recently I think I may have stumbled upon
the issue and wanted to check if there is any way to fix it without
removing and re-adding all the repo-sync schedules. Manually running
any/all of the syncs is successful every time.

I ran -

"grep pulp /var/log/messages | grep repository". This produced
(summarised) -

Aug 1 01:05:51 XXXXX pulp: celery.beat:INFO: Scheduler: Sending due task
download_deferred_content
(pulp.server.controllers.repository.queue_download_deferred)
Aug 1 01:05:51 XXXXX pulp: celery.worker.strategy:INFO: Received task:
pulp.server.controllers.repository.queue_download_deferred[d49dc059-cd10-4f59-b066-eb1516f12a8a]
Aug 1 01:05:51 XXXXX pulp: celery.worker.strategy:INFO: Received task:
pulp.server.controllers.repository.download_deferred[f24f4c50-1a3d-4480-8a58-0ad9998ab8f1]
Aug 1 01:05:51 XXXXX pulp: celery.worker.job:INFO: Task
pulp.server.controllers.repository.queue_download_deferred[d49dc059-cd10-4f59-b066-eb1516f12a8a]
succeeded in 0.0414881289471s: None
Aug 1 01:05:51 XXXXX pulp: celery.worker.job:INFO: Task
pulp.server.controllers.repository.download_deferred[f24f4c50-1a3d-4480-8a58-0ad9998ab8f1]
succeeded in 0.0459181109909s: None
[snip]

- This suggests to me that the syncs should run but are deferred for
whatever reason. So I added -

task "list_active_sync_schedules" to /usr/share/foreman/Rakefile as per
https://gist.github.com/ehelms/52405ed3e75ec54ed91f and ran it.

It produced some interesting output. I've added the full output as an
attachment, but to summarise, running -

"awk -F, '{print $4","$5","$10","$11}'
foreman-rake_list_active_sync_schedules.txt"

produced...

"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"
"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"
"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"

...and so on for each repo.

This appears to suggest that none of the syncs will run again until after
30th November 2020!

I checked the date/time on the server and it is correct. A check on shell
histories on the system to see if anyone had previously altered the system
date revealed nothing. I know there were movements to add an NTP appliance
to the site around that time that could have potentially caused the system
date to jump into the future, but cant confirm that.

So my question is, "is there any way I can reset the "last_run_at" field"
in FK to allow the syncs to resume without removing and re-adding them"?

I have the output of a "foreman-debug" that I can attach if requested,

Thanks in advance for any/all information.

Regards,

Craig.

System Information:
Foreman 1.11.4

Installed Packages

- candlepin-0.9.54.6-1.el7.noarch
- candlepin-selinux-0.9.54.6-1.el7.noarch
- foreman-installer-katello-3.0.2-1.el7.noarch
- katello-3.0.0-9.el7.noarch
- katello-agent-2.7.0-1.el7.noarch
- katello-agent-fact-plugin-2.7.0-1.el7.noarch
- katello-ca-consumer-lmorea4vapp01.ecpplatform.net-1.0-1.noarch
- katello-certs-tools-2.4.0-1.el7.noarch
- katello-client-bootstrap-1.1.0-1.el7.noarch
- katello-common-3.0.0-9.el7.noarch
- katello-debug-3.0.0-9.el7.noarch
- katello-default-ca-1.0-1.noarch
- katello-installer-base-3.0.2-1.el7.noarch
- katello-repos-3.0.0-3.el7.noarch
- katello-selinux-3.0.1-1.el7.noarch
- katello-server-ca-1.0-1.noarch
- katello-service-3.0.0-9.el7.noarch
- libqpid-dispatch-0.4-11.el7.x86_64
- XXXXXt-qpid-broker-1.0-1.noarch
- XXXXX-qpid-client-cert-1.0-1.noarch
- XXXXX-qpid-router-client-1.0-1.noarch
- XXXXX-qpid-router-server-1.0-1.noarch
- pulp-admin-client-2.8.6-1.el7.noarch
- pulp-client-1.0-1.noarch
- pulp-docker-plugins-2.0.2-1.el7.noarch
- pulp-katello-1.0-1.el7.noarch
- pulp-puppet-plugins-2.8.6-1.el7.noarch
- pulp-puppet-tools-2.8.6-1.el7.noarch
- pulp-rpm-admin-extensions-2.8.6-1.el7.noarch
- pulp-rpm-handlers-2.8.6-1.el7.noarch
- pulp-rpm-plugins-2.8.6-1.el7.noarch
- pulp-selinux-2.8.6-1.el7.noarch
- pulp-server-2.8.6-1.el7.noarch
- python-gofer-qpid-2.7.6-1.el7.noarch
- python-isodate-0.5.0-4.pulp.el7.noarch
- python-kombu-3.0.33-5.pulp.el7.noarch
- python-pulp-agent-lib-2.8.6-1.el7.noarch
- python-pulp-bindings-2.8.6-1.el7.noarch
- python-pulp-client-lib-2.8.6-1.el7.noarch
- python-pulp-common-2.8.6-1.el7.noarch
- python-pulp-docker-common-2.0.2-1.el7.noarch
- python-pulp-oid_validation-2.8.6-1.el7.noarch
- python-pulp-puppet-common-2.8.6-1.el7.noarch
- python-pulp-repoauth-2.8.6-1.el7.noarch
- python-pulp-rpm-common-2.8.6-1.el7.noarch
- python-pulp-streamer-2.8.6-1.el7.noarch
- python-qpid-0.30-9.el7.noarch
- python-qpid-proton-0.9-13.el7.x86_64
- python-qpid-qmf-0.30-5.el7.x86_64
- qpid-cpp-client-0.30-8.el7.x86_64
- qpid-cpp-client-devel-0.30-8.el7.x86_64
- qpid-cpp-server-0.30-8.el7.x86_64
- qpid-cpp-server-linearstore-0.30-8.el7.x86_64
- qpid-dispatch-router-0.4-11.el7.x86_64
- qpid-proton-c-0.9-13.el7.x86_64
- qpid-qmf-0.30-5.el7.x86_64
- qpid-tools-0.30-4.el7.noarch
- rubygem-smart_proxy_pulp-1.3.0-1.el7.noarch
- tfm-rubygem-hammer_cli_katello-0.0.26-1.el7.noarch
- tfm-rubygem-katello-3.0.2-1.el7.noarch
- tfm-rubygem-qpid_messaging-0.30.0-7.el7.x86_64

=================

We seem to have resolved this issue. I'm not exactly sure what went wrong, but the thinking is that a future-queued task got stuck somehow and prevented candlepin/event queue monitor from picking up any queued tasks.

Ran *pulp-admin tasks list *which resulted in a large number of tasks similar to this:

Operations: publish
Resources: --name-removed-- (repository)
State: Waiting
Start Time: Unstarted
Finish Time: Incomplete
Task Id: 08ec755f-d68a-49ee-a3d7-9efd8b570a07

The assumption was made that these tasks are related to the sync plan that we have set up for all the repos. There were various 'Operation' types including syncand content_applicability_regeneration. The assumption was made as none of these tasks appeared in the web gui.

Ran a script to grep the Task Id of all tasks and cancel them with *pulp-admin tasks cancel --task-id=task_id_number_here *then restarted all services with *katello-service restart*

Seemed to have done the trick, syncing is working fine now.

Still in the dark as to why this may have happened, especially as we were unable to find any error logs related.


··· On Monday, November 27, 2017 at 4:33:27 PM UTC+13, Jasper Connery wrote:

Hey Craig,

Did you end up coming across a solution for this? We have run into similar
issues with our Katello server, and are struggling to resolve (or even
really locate where) the problem(s).

Cheers.

On Thursday, August 3, 2017 at 1:41:33 AM UTC+12, craig.c...@gmail.com > wrote:

Hi All,
I've recently noticed that our FK server has not been running any repo
sync-jobs for the last few months. I've spent quite a while attempting to
debug this without success, but recently I think I may have stumbled upon
the issue and wanted to check if there is any way to fix it without
removing and re-adding all the repo-sync schedules. Manually running
any/all of the syncs is successful every time.

I ran -

"grep pulp /var/log/messages | grep repository". This produced
(summarised) -

Aug 1 01:05:51 XXXXX pulp: celery.beat:INFO: Scheduler: Sending due task
download_deferred_content
(pulp.server.controllers.repository.queue_download_deferred)
Aug 1 01:05:51 XXXXX pulp: celery.worker.strategy:INFO: Received task:
pulp.server.controllers.repository.queue_download_deferred[d49dc059-cd10-4f59-b066-eb1516f12a8a]
Aug 1 01:05:51 XXXXX pulp: celery.worker.strategy:INFO: Received task:
pulp.server.controllers.repository.download_deferred[f24f4c50-1a3d-4480-8a58-0ad9998ab8f1]
Aug 1 01:05:51 XXXXX pulp: celery.worker.job:INFO: Task
pulp.server.controllers.repository.queue_download_deferred[d49dc059-cd10-4f59-b066-eb1516f12a8a]
succeeded in 0.0414881289471s: None
Aug 1 01:05:51 XXXXX pulp: celery.worker.job:INFO: Task
pulp.server.controllers.repository.download_deferred[f24f4c50-1a3d-4480-8a58-0ad9998ab8f1]
succeeded in 0.0459181109909s: None
[snip]

- This suggests to me that the syncs should run but are deferred for
whatever reason. So I added -

task "list_active_sync_schedules" to /usr/share/foreman/Rakefile as per
https://gist.github.com/ehelms/52405ed3e75ec54ed91f and ran it.

It produced some interesting output. I've added the full output as an
attachment, but to summarise, running -

"awk -F, '{print $4","$5","$10","$11}'
foreman-rake_list_active_sync_schedules.txt"

produced...

"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"
"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"
"first_run"=>"2016-07-09T00:00:00Z", "total_run_count"=>31,
"enabled"=>true, "last_run_at"=>"2020-11-30T09:44:02Z"

...and so on for each repo.

This appears to suggest that none of the syncs will run again until after
30th November 2020!

I checked the date/time on the server and it is correct. A check on
shell histories on the system to see if anyone had previously altered the
system date revealed nothing. I know there were movements to add an NTP
appliance to the site around that time that could have potentially caused
the system date to jump into the future, but cant confirm that.

So my question is, "is there any way I can reset the "last_run_at" field"
in FK to allow the syncs to resume without removing and re-adding them"?

I have the output of a "foreman-debug" that I can attach if requested,

Thanks in advance for any/all information.

Regards,

Craig.

System Information:
Foreman 1.11.4

Installed Packages

- candlepin-0.9.54.6-1.el7.noarch
- candlepin-selinux-0.9.54.6-1.el7.noarch
- foreman-installer-katello-3.0.2-1.el7.noarch
- katello-3.0.0-9.el7.noarch
- katello-agent-2.7.0-1.el7.noarch
- katello-agent-fact-plugin-2.7.0-1.el7.noarch
- katello-ca-consumer-lmorea4vapp01.ecpplatform.net-1.0-1.noarch
- katello-certs-tools-2.4.0-1.el7.noarch
- katello-client-bootstrap-1.1.0-1.el7.noarch
- katello-common-3.0.0-9.el7.noarch
- katello-debug-3.0.0-9.el7.noarch
- katello-default-ca-1.0-1.noarch
- katello-installer-base-3.0.2-1.el7.noarch
- katello-repos-3.0.0-3.el7.noarch
- katello-selinux-3.0.1-1.el7.noarch
- katello-server-ca-1.0-1.noarch
- katello-service-3.0.0-9.el7.noarch
- libqpid-dispatch-0.4-11.el7.x86_64
- XXXXXt-qpid-broker-1.0-1.noarch
- XXXXX-qpid-client-cert-1.0-1.noarch
- XXXXX-qpid-router-client-1.0-1.noarch
- XXXXX-qpid-router-server-1.0-1.noarch
- pulp-admin-client-2.8.6-1.el7.noarch
- pulp-client-1.0-1.noarch
- pulp-docker-plugins-2.0.2-1.el7.noarch
- pulp-katello-1.0-1.el7.noarch
- pulp-puppet-plugins-2.8.6-1.el7.noarch
- pulp-puppet-tools-2.8.6-1.el7.noarch
- pulp-rpm-admin-extensions-2.8.6-1.el7.noarch
- pulp-rpm-handlers-2.8.6-1.el7.noarch
- pulp-rpm-plugins-2.8.6-1.el7.noarch
- pulp-selinux-2.8.6-1.el7.noarch
- pulp-server-2.8.6-1.el7.noarch
- python-gofer-qpid-2.7.6-1.el7.noarch
- python-isodate-0.5.0-4.pulp.el7.noarch
- python-kombu-3.0.33-5.pulp.el7.noarch
- python-pulp-agent-lib-2.8.6-1.el7.noarch
- python-pulp-bindings-2.8.6-1.el7.noarch
- python-pulp-client-lib-2.8.6-1.el7.noarch
- python-pulp-common-2.8.6-1.el7.noarch
- python-pulp-docker-common-2.0.2-1.el7.noarch
- python-pulp-oid_validation-2.8.6-1.el7.noarch
- python-pulp-puppet-common-2.8.6-1.el7.noarch
- python-pulp-repoauth-2.8.6-1.el7.noarch
- python-pulp-rpm-common-2.8.6-1.el7.noarch
- python-pulp-streamer-2.8.6-1.el7.noarch
- python-qpid-0.30-9.el7.noarch
- python-qpid-proton-0.9-13.el7.x86_64
- python-qpid-qmf-0.30-5.el7.x86_64
- qpid-cpp-client-0.30-8.el7.x86_64
- qpid-cpp-client-devel-0.30-8.el7.x86_64
- qpid-cpp-server-0.30-8.el7.x86_64
- qpid-cpp-server-linearstore-0.30-8.el7.x86_64
- qpid-dispatch-router-0.4-11.el7.x86_64
- qpid-proton-c-0.9-13.el7.x86_64
- qpid-qmf-0.30-5.el7.x86_64
- qpid-tools-0.30-4.el7.noarch
- rubygem-smart_proxy_pulp-1.3.0-1.el7.noarch
- tfm-rubygem-hammer_cli_katello-0.0.26-1.el7.noarch
- tfm-rubygem-katello-3.0.2-1.el7.noarch
- tfm-rubygem-qpid_messaging-0.30.0-7.el7.x86_64

=================