Remote Execution job runs the command on all servers simultaneously, even when set to distribute over X seconds

SlickNetAaron · October 18, 2019, 11:22pm

Problem: I’ve been Foreman and Remote Execution this to patch and reboot many machines in a staggered time-span. Since I just upgraded to Foreman 1.23.0, jobs run all hosts almost immediately.

When viewing the Job invocation status as the job is running, it shows most of the hosts as Pending, but I can actually click on the host name and see the console output. The timestamp on the console output proves the job ran on all hosts almost immediately.

The job status correctly shows the schedule to be “Set to distribute over: 900 seconds”.

Expected outcome: Job would execute on hosts in a staggered fashion, timed evenly over the specified time-span.

Foreman and Proxy versions: Foreman 1.23.0

Foreman and Proxy plugin versions: Remote Execution 1.8.2

Other relevant data:
Example job invocation:

hammer job-invocation create --time-span 600 --job-template "Run Command - SSH Default" --inputs command="date; ls -la" --search-query "name ^ ( [25 hosts truncated] )" --async

I started the job and all 25 hosts completed faster than I could load the page, yet all host but the first was status: pending. The output of the date command shows they all executed about 20 seconds after I initiated the job or 1 min 20 seconds.

The job task clearly shows the time-span, and the task status has everything pending, but the job actually ran. This is all running from the locally installed proxy on the same server.

{
  "job_invocation": {
    "id": 436,
    "name": "Commands",
    "description": "Run date; ls -la"
  },
  "concurrency_control": {
    "time": {
      "tickets": null,
      "free": 1,
      "meta": {
        "interval": 24,
        "time_span": 600
      }
    }
  },
  "job_category": "Commands",
  "job_invocation_id": 436,
  "current_request_id": null,
  "current_timezone": "Central Time (US & Canada)",
  "current_user_id": 4,
  "current_organization_id": null,
  "current_location_id": null
}

Raw output:

{
  "planned_count": 25,
  "cancelled_count": 0,
  "total_count": 25,
  "failed_count": 0,
  "pending_count": 24,
  "success_count": 1
}

I’ve searched for logs, enabled debug logs and I’m not finding posts with a similar issue. Any idea how I can try to troubleshoot this?

Thank you!
Aaron

aruzicka · October 21, 2019, 9:19am

Hi,
it looks like you’ve found a bug, I filed it as an issue[1] in our issue tracker. In theory disabling batch triggering in Settings > Foreman Tasks could be a workaround,

[1] - Bug #28095: Concurrency control doesn't work when batch planning is enabled - foreman-tasks - Foreman

SlickNetAaron · October 21, 2019, 2:16pm

Cheers! You nailed it!

I can also confirm disabling foreman_tasks_proxy_batch_trigger does work around to properly stagger the execution time-span. Luckily I don’t have many hosts to worry about the performance hit.

Thank you!

SlickNetAaron · January 29, 2020, 2:59pm

Thanks @aruzicka

I see this was fixed and slated for foreman-tasks-1.0.2

What I don’t understand is how to update individual components. Do different versions of plugins get pinned to specific Foreman releases?

Looking at Github https://github.com/theforeman/foreman-tasks it seems like my foreman-tasks 0.16.2 is quite a few releases behind.

Thanks again!
Aaron

Dirk · January 29, 2020, 3:23pm

Updates are handled via packaging. Releases of a gem require a pull-request in the foreman-packaging repository on rpm/devel or deb/devel to get them in nightly. To get available in stable branches it requires a cherry-pick for this branch. This ensures that only compatible versions are released on stable versions.

As a user you do not update individual components on your own, instead you get them simply during package update using yum or apt.

If a component is behind, typically also foreman is behind.

SlickNetAaron · January 29, 2020, 3:27pm

Thanks for the info @Dirk!