Recurring Logic next occurence is older than last occurence

mindo · November 24, 2021, 6:40pm

Problem:
We have several Recurring Logics to automatically run some commands on multiple host_collections.
After running this morning on several of the RLs the Next occurrence is older than the Last occurrence

$ hammer recurring-logic info --id 24 --fields 'Id,Last occurrence,Next occurrence'
ID:              24
Last occurrence: 2021-11-24 08:30:12 UTC
Next occurrence: 2021-11-24 08:30:00 UTC

$ hammer recurring-logic info --id 26 --fields 'Id,Last occurrence,Next occurrence'
ID:              26
Last occurrence: 2021-11-24 08:15:11 UTC
Next occurrence: 2021-11-24 08:15:00 UTC

Expected outcome:
Something like this:

$ hammer recurring-logic info --id 24 --fields 'Id,Last occurrence,Next occurrence'
ID:              24
Last occurrence: 2021-11-24 08:30:12 UTC
Next occurrence: 2021-11-25 08:30:00 UTC

$ hammer recurring-logic info --id 26 --fields 'Id,Last occurrence,Next occurrence'
ID:              26
Last occurrence: 2021-11-24 08:15:11 UTC
Next occurrence: 2021-11-25 08:15:00 UTC

$ hammer recurring-logic info --id 27 --fields 'Id,Last occurrence,Next occurrence'
ID:              27
Last occurrence: 2021-11-24 08:30:13 UTC
Next occurrence: 2021-11-24 08:30:00 UTC

Foreman and Proxy versions:
Foreman version: 3.0.1

Foreman and Proxy plugin versions:
Plugins:

foreman-tasks 5.1.1
foreman_puppet 1.0.5
foreman_remote_execution 4.7.0
katello 4.2.1

Distribution and version:
CentOS Linux release 7.9.2009 (Core)

Other relevant data:
We had this issue about a month ago and we ended disabling the old RLs and created new ones.

Database info:

foreman=# select * from foreman_tasks_recurring_logics where id in (24,26,27);
 id |    cron_line    | end_time | max_iteration | iteration | task_group_id | state  | triggering_id | purpose 
----+-----------------+----------+---------------+-----------+---------------+--------+---------------+---------
 24 | 30 08 * * *     |          |               |        40 |           974 | active |           110 | 
 27 | 30 08 * * *     |          |               |        40 |           980 | active |           113 | 
 26 | 15 8 2-31/2 * * |          |               |        20 |           978 | active |           112 | 
(3 rows)

foreman=# select * from foreman_tasks_triggerings where id in (select triggering_id from foreman_tasks_recurring_logics where id in (24,26,27));
 id  |   mode    |          start_at          | start_before 
-----+-----------+----------------------------+--------------
 110 | recurring | 2021-10-15 15:36:09.369358 | 
 112 | recurring | 2021-10-15 15:44:36.952605 | 
 113 | recurring | 2021-10-15 15:46:01.406246 | 
(3 rows)

We also double checked and all the task of this morning finished successfully.

$ hammer task info --id c20ed7f6-eed9-4593-9efa-7410d07359c8 # RL 24
ID:          c20ed7f6-eed9-4593-9efa-7410d07359c8
Action:      Run hosts job: Run yum update -y
State:       stopped
Result:      success
Started at:  2021/11/24 08:30:12
Ended at:    2021/11/24 08:30:23
Owner:       ams
Task errors:

$ hammer task info --id 3d0bf172-b5e7-4e56-a20a-c957ca6dd437 # RL 26
ID:          3d0bf172-b5e7-4e56-a20a-c957ca6dd437
Action:      Run hosts job: Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ]; then shutdown -r 1 ; fi
State:       stopped
Result:      success
Started at:  2021/11/24 08:15:11
Ended at:    2021/11/24 08:15:22
Owner:       ams
Task errors:

Although none failed the execution of RL 26 did not for any host - we made no changes to the host_collection

Description 	Search Query 	Status 	Succeeded 	Failed 	Pending 	Total hosts 	Start
Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ]; then shu... 	host_collect... 	succeeded 	0 	0 	0 	0 	about 10 hours ago
Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ]; then shu... 	host_collect... 	failed 	6 	1 	0 	7 	2 days ago
Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ]; then shu... 	host_collect... 	failed 	6 	1 	0 	7 	4 days ago
Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ]; then shu... 	host_collect... 	succeeded 	6 	0 	0 	6 	6 days ago

hammer host-collection info --id 4
Id:          4
Name:        auto-update-and-reboot-even-days
Limit:       None
Description: 
Total Hosts: 7

aruzicka · November 25, 2021, 9:09am

Hi,
since task c20ed7f6-eed9-4593-9efa-7410d07359c8 finished, RL 24 should have a new scheduled task. Does it have it? If not, are there any errors from around that time in production.log?

mindo · November 25, 2021, 6:38pm

Hi,

I don’t thing I have scheduled task for each of those RLs because a day has gone by and neither the Last occurrence or the Next occurrence` have changed.

I got these messages on production.log

$ grep 2021-11-24T08 /var/log/foreman/production.log| grep d53e7f2d
2021-11-24T08:30:12 [I|bac|d53e7f2d] Task {label: Actions::RemoteExecution::RunHostsJob, id: c20ed7f6-eed9-4593-9efa-7410d07359c8, execution_plan_id: b096bad9-b0a0-4c2f-93da-899dab8a98c2} state changed: planning 
2021-11-24T08:30:13 [I|bac|d53e7f2d] Task {label: Actions::RemoteExecution::RunHostsJob, id: c20ed7f6-eed9-4593-9efa-7410d07359c8, execution_plan_id: b096bad9-b0a0-4c2f-93da-899dab8a98c2} state changed: planned 
2021-11-24T08:30:13 [E|dyn|d53e7f2d] Failed to run hook 'trigger_repeat' for action 'Actions::RemoteExecution::RunHostsJob'
2021-11-24T08:30:13 [I|bac|d53e7f2d] Task {label: Actions::RemoteExecution::RunHostsJob, id: c20ed7f6-eed9-4593-9efa-7410d07359c8, execution_plan_id: b096bad9-b0a0-4c2f-93da-899dab8a98c2} state changed: running 
2021-11-24T08:30:13 [W|app|d53e7f2d] Scoped order is ignored, it's forced to be batch order.
2021-11-24T08:30:13 [W|app|d53e7f2d] Scoped order is ignored, it's forced to be batch order.
2021-11-24T08:30:23 [I|bac|d53e7f2d] Task {label: Actions::RemoteExecution::RunHostsJob, id: c20ed7f6-eed9-4593-9efa-7410d07359c8, execution_plan_id: b096bad9-b0a0-4c2f-93da-899dab8a98c2} state changed: stopped  result: success
2021-11-24T08:30:23 [I|bac|d53e7f2d] Task {label: Actions::RemoteExecution::RunHostsJob, id: c20ed7f6-eed9-4593-9efa-7410d07359c8, execution_plan_id: b096bad9-b0a0-4c2f-93da-899dab8a98c2} state changed: stopped  result: success

For all the other RLs the logs are similar (the only changes are the timestamps and ids).

aruzicka · November 26, 2021, 10:02am

This is it. It failed to prepare the next run. Sadly it doesn’t say why. Anything in output of journalctl -u dynflow-sidekiq@\*?

mindo · November 26, 2021, 12:22pm

Hi,

For the 24th I’ve got this:

Nov 24 02:00:37 updates.example.com systemd[1]: Stopping Foreman jobs daemon - worker-hosts-queue-1 on sidekiq...
Nov 24 02:00:37 updates.example.com systemd[1]: Stopping Foreman jobs daemon - worker-1 on sidekiq...
Nov 24 02:00:39 updates.example.com systemd[1]: Stopped Foreman jobs daemon - worker-hosts-queue-1 on sidekiq.
Nov 24 02:00:40 updates.example.com systemd[1]: Stopped Foreman jobs daemon - worker-1 on sidekiq.
Nov 24 02:00:40 updates.example.com systemd[1]: Stopping Foreman jobs daemon - orchestrator on sidekiq...
Nov 24 02:00:42 updates.example.com systemd[1]: Stopped Foreman jobs daemon - orchestrator on sidekiq.
Nov 24 02:01:55 updates.example.com systemd[1]: Starting Foreman jobs daemon - orchestrator on sidekiq...
Nov 24 02:02:01 updates.example.com dynflow-sidekiq@orchestrator[70090]: 2021-11-24T02:02:01.645Z 70090 TID-1izi INFO: GitLab reliable fetch activated!
Nov 24 02:02:01 updates.example.com dynflow-sidekiq@orchestrator[70090]: 2021-11-24T02:02:01.706Z 70090 TID-1izm INFO: Booting Sidekiq 5.2.7 with redis options {:id=>"Sidekiq-server-PID-70090", :url=>"redis://localhost:6379/0"}
Nov 24 02:02:16 updates.example.com dynflow-sidekiq@orchestrator[70090]: /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_puppet-1.0.4/lib/foreman_puppet/register.rb:141: warning: already initialized constant Foreman::Plugin::RbacSupport::AUTO_EXTENDED_ROLES
Nov 24 02:02:16 updates.example.com dynflow-sidekiq@orchestrator[70090]: /usr/share/foreman/app/registries/foreman/plugin/rbac_support.rb:5: warning: previous definition of AUTO_EXTENDED_ROLES was here
Nov 24 02:02:34 updates.example.com systemd[1]: Started Foreman jobs daemon - orchestrator on sidekiq.
Nov 24 02:02:34 updates.example.com systemd[1]: Starting Foreman jobs daemon - worker-1 on sidekiq...
Nov 24 02:02:34 updates.example.com systemd[1]: Starting Foreman jobs daemon - worker-hosts-queue-1 on sidekiq...
Nov 24 02:02:36 updates.example.com dynflow-sidekiq@worker-1[71156]: 2021-11-24T02:02:36.025Z 71156 TID-1i74 INFO: GitLab reliable fetch activated!
Nov 24 02:02:36 updates.example.com dynflow-sidekiq@worker-1[71156]: 2021-11-24T02:02:36.026Z 71156 TID-1i70 INFO: Booting Sidekiq 5.2.7 with redis options {:id=>"Sidekiq-server-PID-71156", :url=>"redis://localhost:6379/0"}
Nov 24 02:02:36 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[71169]: 2021-11-24T02:02:36.070Z 71169 TID-1i1h INFO: GitLab reliable fetch activated!
Nov 24 02:02:36 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[71169]: 2021-11-24T02:02:36.071Z 71169 TID-1i21 INFO: Booting Sidekiq 5.2.7 with redis options {:id=>"Sidekiq-server-PID-71169", :url=>"redis://localhost:6379/0"}
Nov 24 02:02:44 updates.example.com dynflow-sidekiq@worker-1[71156]: /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_puppet-1.0.4/lib/foreman_puppet/register.rb:141: warning: already initialized constant Foreman::Plugin::RbacSupport::AUTO_EXTENDED_ROLES
Nov 24 02:02:44 updates.example.com dynflow-sidekiq@worker-1[71156]: /usr/share/foreman/app/registries/foreman/plugin/rbac_support.rb:5: warning: previous definition of AUTO_EXTENDED_ROLES was here
Nov 24 02:02:44 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[71169]: /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_puppet-1.0.4/lib/foreman_puppet/register.rb:141: warning: already initialized constant Foreman::Plugin::RbacSupport::AUTO_EXTENDED_ROLES
Nov 24 02:02:44 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[71169]: /usr/share/foreman/app/registries/foreman/plugin/rbac_support.rb:5: warning: previous definition of AUTO_EXTENDED_ROLES was here
Nov 24 02:02:58 updates.example.com systemd[1]: Started Foreman jobs daemon - worker-1 on sidekiq.
Nov 24 02:02:58 updates.example.com systemd[1]: Started Foreman jobs daemon - worker-hosts-queue-1 on sidekiq.
Nov 24 17:06:14 updates.example.com systemd[1]: Stopping Foreman jobs daemon - worker-hosts-queue-1 on sidekiq...
Nov 24 17:06:16 updates.example.com systemd[1]: Stopped Foreman jobs daemon - worker-hosts-queue-1 on sidekiq.
Nov 24 17:06:16 updates.example.com systemd[1]: Starting Foreman jobs daemon - worker-hosts-queue-1 on sidekiq...
Nov 24 17:06:18 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[16987]: 2021-11-24T17:06:18.543Z 16987 TID-dsf INFO: GitLab reliable fetch activated!
Nov 24 17:06:18 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[16987]: 2021-11-24T17:06:18.544Z 16987 TID-dtf INFO: Booting Sidekiq 5.2.7 with redis options {:id=>"Sidekiq-server-PID-16987", :url=>"redis://localhost:6379/0"}
Nov 24 17:06:27 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[16987]: /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_puppet-1.0.5/lib/foreman_puppet/register.rb:141: warning: already initialized constant Foreman::Plugin::RbacSupport::AUTO_EXTENDED_ROLES
Nov 24 17:06:27 updates.example.com dynflow-sidekiq@worker-hosts-queue-1[16987]: /usr/share/foreman/app/registries/foreman/plugin/rbac_support.rb:5: warning: previous definition of AUTO_EXTENDED_ROLES was here
Nov 24 17:06:41 updates.example.com systemd[1]: Started Foreman jobs daemon - worker-hosts-queue-1 on sidekiq.
Nov 24 17:07:09 updates.example.com systemd[1]: Stopping Foreman jobs daemon - orchestrator on sidekiq...
Nov 24 17:07:10 updates.example.com systemd[1]: Stopped Foreman jobs daemon - orchestrator on sidekiq.
Nov 24 17:07:10 updates.example.com systemd[1]: Starting Foreman jobs daemon - orchestrator on sidekiq...
Nov 24 17:07:12 updates.example.com dynflow-sidekiq@orchestrator[17477]: 2021-11-24T17:07:12.632Z 17477 TID-de9 INFO: GitLab reliable fetch activated!
Nov 24 17:07:12 updates.example.com dynflow-sidekiq@orchestrator[17477]: 2021-11-24T17:07:12.633Z 17477 TID-df1 INFO: Booting Sidekiq 5.2.7 with redis options {:id=>"Sidekiq-server-PID-17477", :url=>"redis://localhost:6379/0"}
Nov 24 17:07:21 updates.example.com dynflow-sidekiq@orchestrator[17477]: /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_puppet-1.0.5/lib/foreman_puppet/register.rb:141: warning: already initialized constant Foreman::Plugin::RbacSupport::AUTO_EXTENDED_ROLES
Nov 24 17:07:21 updates.example.com dynflow-sidekiq@orchestrator[17477]: /usr/share/foreman/app/registries/foreman/plugin/rbac_support.rb:5: warning: previous definition of AUTO_EXTENDED_ROLES was here
Nov 24 17:07:36 updates.example.com systemd[1]: Started Foreman jobs daemon - orchestrator on sidekiq.
Nov 24 17:07:37 updates.example.com systemd[1]: Stopping Foreman jobs daemon - worker-1 on sidekiq...
Nov 24 17:07:38 updates.example.com systemd[1]: Stopped Foreman jobs daemon - worker-1 on sidekiq.
Nov 24 17:07:38 updates.example.com systemd[1]: Starting Foreman jobs daemon - worker-1 on sidekiq...
Nov 24 17:07:40 updates.example.com dynflow-sidekiq@worker-1[17616]: 2021-11-24T17:07:40.449Z 17616 TID-dac INFO: GitLab reliable fetch activated!
Nov 24 17:07:40 updates.example.com dynflow-sidekiq@worker-1[17616]: 2021-11-24T17:07:40.451Z 17616 TID-dbs INFO: Booting Sidekiq 5.2.7 with redis options {:id=>"Sidekiq-server-PID-17616", :url=>"redis://localhost:6379/0"}
Nov 24 17:07:49 updates.example.com dynflow-sidekiq@worker-1[17616]: /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_puppet-1.0.5/lib/foreman_puppet/register.rb:141: warning: already initialized constant Foreman::Plugin::RbacSupport::AUTO_EXTENDED_ROLES
Nov 24 17:07:49 updates.example.com dynflow-sidekiq@worker-1[17616]: /usr/share/foreman/app/registries/foreman/plugin/rbac_support.rb:5: warning: previous definition of AUTO_EXTENDED_ROLES was here
Nov 24 17:08:02 updates.example.com systemd[1]: Started Foreman jobs daemon - worker-1 on sidekiq.

Everyday at 2AM we stop foreman to execute a foreman-maintain backup offline ...
The stop/start after 5PM was done because when we’re trying to debug this issue we noticed that we had somes updates, installed then we ran foreman-installer

    Updated tfm-rubygem-foreman-tasks-5.1.0-1.fm3_0.el7.noarch  @foreman-plugins
    Update                            5.1.1-1.fm3_0.el7.noarch  @foreman-plugins
    Updated tfm-rubygem-foreman_puppet-1.0.4-1.fm3_0.el7.noarch @foreman-plugins
    Update                             1.0.5-1.fm3_0.el7.noarch @foreman-plugins
    Updated tfm-rubygem-hammer_cli-3.0.0-1.el7.noarch           @foreman
    Update                         3.0.1-1.el7.noarch           @foreman

aruzicka · November 29, 2021, 8:32am

Sadly that doesn’t really say anything.

Looking at the code, more details about the failure are logged only as debug information. Could you bump your log level to debug and run with it until you hit the issue again?

mindo · December 3, 2021, 4:03pm

Hello,
We enabled the debug logs and create new RLs and we having RLs failing again - even one that was running just fine.
Full log here: foreman_rl25_failing - Pastebin.com

The problem appears to be here:

2021-12-03T08:15:04 [I|bac|24061581] Task {label: Actions::RemoteExecution::RunHostsJob, id: 12569eb3-34ed-4e12-ab6e-c9989a8630ba, execution_plan_id: 4c73ca6b-799a-4632-95a7-7bd63b7b6925} state changed: planned 
2021-12-03T08:15:04 [E|dyn|24061581] Failed to run hook 'trigger_repeat' for action 'Actions::RemoteExecution::RunHostsJob'
2021-12-03T08:15:04 [D|dyn|24061581] Validation failed: Job category can't be blank (ActiveRecord::RecordInvalid)

The debug message exactly after failing to trigger_repeat complains about Job category being blank, but it is not blank:

hammer job-invocation info --id 1073
---
ID: 1073
Description: Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ];
then shutdown -r 1 ; fi
Status: succeeded
Success: 0
Failed: 0
Pending: 0
Missing: 0
Total: 0
Start: 2021-12-03 08:15:00 UTC
Randomized ordering: false
Job Category: Commands
Mode: recurring
Cron line: 15 8 1-31/2 * *
Recurring logic ID: 25
Hosts: []

As it was not blank the previous time it successfully executed:

hammer job-invocation info --id 1072
---
ID: 1072
Description: Run yum update -y; needs-restarting -r; if [ "$(echo $?)" -eq "1" ];
then shutdown -r 1 ; fi
Status: failed
Success: 4
Failed: 2
Pending: 0
Missing: 0
Total: 6
Start: 2021-12-01 08:15:00 UTC
Randomized ordering: false
Job Category: Commands
Mode: recurring
Cron line: 15 8 1-31/2 * *
Recurring logic ID: 25
Hosts:
- Name: <my host 00>
- Name: <my host 01>
- Name: <my host 02>
- Name: <my host 03>
- Name: <my host 04>
- Name: <my host 05>

aruzicka · December 7, 2021, 9:43am

Interesting. There are a couple of scenarios where this might be expected

The user who originally started the job loses the permissions to view the template being run
The template is moved to a different category
The template (or the user) is moved to a different org/loc so the user who originally started the job cannot see the template anymore from the org/loc they originally started the job from

Has any of that happened recently?

mindo · December 9, 2021, 9:46am

Hello @aruzicka

All the RLs were created with my own user and the permissions were not changed. Should we been using the admin account instead of our own accounts? My user is part of the group foreman-admins
We did not move the template
We did not move neither templates or the user across org/locations

The only change we made recently - I don’t recall exactly when but it was in the last couple of months, we added a new Organization to test them. We plan to add our costumer machines to their own organizations.
But the test organization does not even has hosts and the RL problems we are seeing is on our main Organization.
So right now we have 3 Org:

Default: not in use
OurCompany: in use, all our machines are added here and it’s the one that as the host collections and the RLs
OurCompany-TST: not use, but it has 3 Products, but no hosts neither RLs.

kkeane · March 9, 2023, 2:13am

I am running into the same issue with foreman 3.5.1. Exact same scenario: the job runs regularly, until the next scheduled occurrence happens during an offline backup.

Here is the recurring logic for my job. The backup starts at 3:30 AM, and finishes at approximately 6 AM or so.

The job says “Evaluated at: 2023-03-07 06:13:30 -0800” (matches the last occurrence time) but it did not run at that time.

ID	4
Cron line	15 * * * *
Action	Run hosts job: Update Satellite reports and enable Ansible bootstrapping
Last occurrence	2023-03-07 06:13:30 -0800
Next occurrence	2023-03-07 04:15:00 -0800
Current iteration	303
Iteration limit	Unlimited
Repeat until	Unlimited
State	Active
Purpose
Task count	303

aruzicka · March 9, 2023, 11:07am

That seems to be a different issue than what @mindo was describing. There was an issue that could cause tasks that should get executed during downtime to not run and therefore not trigger the next repetition. It should be resolved in dynflow-1.6.10

kkeane · March 9, 2023, 4:00pm

@aruzicka

That’s good to know - thank you. You are right in how you described the issue.

Maybe I was misunderstanding what @mindo said, but it seemed like that, too, was caused by a downtime; he also mentioned the offline backup.

Regardless - thanks!