"Proxy task gone missing from the smart proxy" if job takes more than 10m to finish

Problem:
Running a standard job of type script just executing the sleep command. After 10m the job fails with message “Actions::ProxyAction::ProxyActionMissing: Proxy task gone missing from the smart proxy”. Although foreman marks the job has failed the script continues to run and I’m able to follow the execution in /var/tmp/foreman-ssh-cmd-<id>/output file.

The 10m (600 secs) come from the “POLL_INTERVAL = 600” constant in watch_delegated_proxy_sub_tasks.rb. I tried to do some debugging and discovered that the condition is triggered in the function process_task_results because the state of the parent job is returning as stopped (“state”=>“stopped”, “result”=>“success”), hence the job is considered missing. I don’t find the corresponding parent_task_id in the foreman_tasks_tasks table.
I’m trying to follow the code but I’m hoping someone here that is more knowledgeable about both ruby and the foreman code might help get to a conclusion a lot quicker.

Expected outcome:
Job to finish.

Foreman and Proxy versions:
foreman-proxy-3.9.1-1.el8

Foreman and Proxy plugin versions:
rubygem-foreman-tasks-9.0.1-1.fm3_9.el8

Distribution and version:
RHEL 8.9

Thanks!

Hi,
you’re not the first to run into this, see Failed tasks to update packages - proxy task gone missing - #20 by m-bucher for more details.

Well done on the analysis, you got pretty far in there

Thanks, that fix works for me as well!

@aruzicka - Just wanted to let you know that I am also hitting this issue, even with your fix in place. It seems to occur when using Concurrency level limited - so maybe something to do with it waiting, Im not sure. Im not really seeing anything in the logs at all that would provide any additional information.

Possibly related:

Is that specific to salt or are you getting this no matter if you use salt/ansible/script?

That thread is three years old. I’d be surprised if it was the same thing.

That thread is three years old. I’d be surprised if it was the same thing.

lol, ya me too.

Is that specific to salt or are you getting this no matter if you use salt/ansible/script?

We are only running salt commands, but I cant envision that the underlying command itself would be the issue? But Im no expert. We dont use ansible or ssh/scripts. And I think all the salt stuff just piggy backs off dynflow/remex/foreman-proxy?
Whats so curious to me is it just randomly happens, yet if I query dynflow the task is certainly there.
I could potentially setup some hosts with ssh and try to throw scripts through it.

production.log (server 2, this is a load balanced environment):

[root@10-222-206-158 jsparrow]# grep b1b11dff /var/log/foreman/production.log
2025-01-24T14:03:12 [I|app|b1b11dff] Started POST "/api/job_invocations" for 10.222.218.3 at 2025-01-24 14:03:12 +0000
2025-01-24T14:03:12 [I|app|b1b11dff] Processing by Api::V2::JobInvocationsController#create as JSON
2025-01-24T14:03:12 [I|app|b1b11dff]   Parameters: {"job_invocation"=>{"job_template_id"=>172, "targeting_type"=>"static_query", "randomized_ordering"=>nil, "inputs"=>{"state"=>"ssnc-foreman-saltcontrol"}, "ssh_user"=>nil, "ssh"=>{"effective_user"=>"root"}, "recurrence"=>nil, "scheduling"=>nil, "concurrency_control"=>{"concurrency_level"=>"4"}, "bookmark_id"=>nil, "search_query"=>"name ^ (10-213-48-11.ssnc-corp.cloud, 10-222-203-207.ssnc-corp.cloud, 10-222-45-187.ssnc-corp.cloud, 10-234-32-23.ssnc-corp.cloud, 10-234-61-47.ssnc-corp.cloud)", "description_format"=>"Salt: state.apply", "execution_timeout_interval"=>500, "time_to_pickup"=>""}, "apiv"=>"v2"}
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on job_category Salt
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on description Salt: state.apply
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on concurrency_level 4
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on execution_timeout_interval 500
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on password [redacted]
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on key_passphrase
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on remote_execution_feature_id
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on effective_user_password
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on ssh_user
2025-01-24T14:03:12 [I|aud|b1b11dff] JobInvocation (515) create event on time_to_pickup
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: , execution_plan_id: a8122b1d-8609-434f-865a-61e3437ea9d4} state changed: pending
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostsJob, id: 4743b44c-b39c-4a14-8695-c18f3283a17c, execution_plan_id: a8122b1d-8609-434f-865a-61e3437ea9d4} state changed: planning
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostsJob, id: 4743b44c-b39c-4a14-8695-c18f3283a17c, execution_plan_id: a8122b1d-8609-434f-865a-61e3437ea9d4} state changed: planned
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostsJob, id: 4743b44c-b39c-4a14-8695-c18f3283a17c, execution_plan_id: a8122b1d-8609-434f-865a-61e3437ea9d4} state changed: running
2025-01-24T14:03:12 [I|app|b1b11dff]   Rendered /usr/share/gems/gems/foreman_remote_execution-14.0.2/app/views/api/v2/job_invocations/create.json.rabl (Duration: 25.7ms | Allocations: 12859)
2025-01-24T14:03:12 [I|app|b1b11dff] Completed 201 Created in 266ms (Views: 27.6ms | ActiveRecord: 61.9ms | Allocations: 81674)
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: , execution_plan_id: 1b2166d4-ea1c-4481-a77a-b62f8ad6e217} state changed: pending
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 93ea47de-d47c-4ee8-8d55-c508156214a0, execution_plan_id: 1b2166d4-ea1c-4481-a77a-b62f8ad6e217} state changed: planning
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 93ea47de-d47c-4ee8-8d55-c508156214a0, execution_plan_id: 1b2166d4-ea1c-4481-a77a-b62f8ad6e217} state changed: planned
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: , execution_plan_id: d4e4d679-29e6-445c-b115-cd077f624b79} state changed: pending
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 3a57096c-7f6a-492f-ad2d-9e1204c759de, execution_plan_id: d4e4d679-29e6-445c-b115-cd077f624b79} state changed: planning
2025-01-24T14:03:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 93ea47de-d47c-4ee8-8d55-c508156214a0, execution_plan_id: 1b2166d4-ea1c-4481-a77a-b62f8ad6e217} state changed: running
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 3a57096c-7f6a-492f-ad2d-9e1204c759de, execution_plan_id: d4e4d679-29e6-445c-b115-cd077f624b79} state changed: planned
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: , execution_plan_id: 1ba6a87a-872a-4f0d-b2a8-808a4e429e3a} state changed: pending
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 3a57096c-7f6a-492f-ad2d-9e1204c759de, execution_plan_id: d4e4d679-29e6-445c-b115-cd077f624b79} state changed: running
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 41523cc0-3a00-495f-bcb7-94a8c28d5dfa, execution_plan_id: 1ba6a87a-872a-4f0d-b2a8-808a4e429e3a} state changed: planning
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 41523cc0-3a00-495f-bcb7-94a8c28d5dfa, execution_plan_id: 1ba6a87a-872a-4f0d-b2a8-808a4e429e3a} state changed: planned
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: , execution_plan_id: ac996ea6-3236-4790-9d4a-8ca3f4ef9d01} state changed: pending
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 41523cc0-3a00-495f-bcb7-94a8c28d5dfa, execution_plan_id: 1ba6a87a-872a-4f0d-b2a8-808a4e429e3a} state changed: running
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: fe025b84-a951-4f90-beef-97c4e9fec79b, execution_plan_id: ac996ea6-3236-4790-9d4a-8ca3f4ef9d01} state changed: planning
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: fe025b84-a951-4f90-beef-97c4e9fec79b, execution_plan_id: ac996ea6-3236-4790-9d4a-8ca3f4ef9d01} state changed: planned
2025-01-24T14:03:13 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: fe025b84-a951-4f90-beef-97c4e9fec79b, execution_plan_id: ac996ea6-3236-4790-9d4a-8ca3f4ef9d01} state changed: running
2025-01-24T14:03:16 [I|bac|b1b11dff] Event delivered by request 5d607c96-0f2f-471d-bf09-da8b2887b0c9
2025-01-24T14:03:16 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: fe025b84-a951-4f90-beef-97c4e9fec79b, execution_plan_id: ac996ea6-3236-4790-9d4a-8ca3f4ef9d01} state changed: stopped  result: error
2025-01-24T14:03:28 [I|bac|b1b11dff] Task {label: , execution_plan_id: 6268c0e3-d477-45be-a2c8-e9e7b0f11dd1} state changed: pending
2025-01-24T14:03:28 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 8b05b98a-f672-4581-8711-12b045790498, execution_plan_id: 6268c0e3-d477-45be-a2c8-e9e7b0f11dd1} state changed: planning
2025-01-24T14:03:28 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 8b05b98a-f672-4581-8711-12b045790498, execution_plan_id: 6268c0e3-d477-45be-a2c8-e9e7b0f11dd1} state changed: planned
2025-01-24T14:03:28 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 8b05b98a-f672-4581-8711-12b045790498, execution_plan_id: 6268c0e3-d477-45be-a2c8-e9e7b0f11dd1} state changed: running
2025-01-24T14:03:30 [E|bac|b1b11dff] Job execution failed
2025-01-24T14:03:31 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 8b05b98a-f672-4581-8711-12b045790498, execution_plan_id: 6268c0e3-d477-45be-a2c8-e9e7b0f11dd1} state changed: stopped  result: error
2025-01-24T14:03:40 [I|bac|b1b11dff] Event delivered by request dc0cf7db-77c5-4f27-b30f-bc1640579a2c
2025-01-24T14:03:40 [E|bac|b1b11dff] Job execution failed
2025-01-24T14:03:40 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 93ea47de-d47c-4ee8-8d55-c508156214a0, execution_plan_id: 1b2166d4-ea1c-4481-a77a-b62f8ad6e217} state changed: stopped  result: error
2025-01-24T14:13:12 [E|bac|b1b11dff] Proxy task gone missing from the smart proxy (Actions::ProxyAction::ProxyActionMissing)
 b1b11dff |
2025-01-24T14:13:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 3a57096c-7f6a-492f-ad2d-9e1204c759de, execution_plan_id: d4e4d679-29e6-445c-b115-cd077f624b79} state changed: stopped  result: error
2025-01-24T14:13:12 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostJob, id: 41523cc0-3a00-495f-bcb7-94a8c28d5dfa, execution_plan_id: 1ba6a87a-872a-4f0d-b2a8-808a4e429e3a} state changed: stopped  result: error
2025-01-24T14:13:14 [E|bac|b1b11dff] A sub task failed (Dynflow::Action::V2::WithSubPlans::SubtaskFailedException)
 b1b11dff |
2025-01-24T14:13:14 [I|bac|b1b11dff] Task {label: Actions::RemoteExecution::RunHostsJob, id: 4743b44c-b39c-4a14-8695-c18f3283a17c, execution_plan_id: a8122b1d-8609-434f-865a-61e3437ea9d4} state changed: stopped  result: warning

foreman-proxy for this job:

root@10-222-45-187:kc-saltmom-01:[/etc/foreman-proxy/settings.d]:
$ grep d4e4d679-29e6-445c-b115-cd077f624b79 /var/log/foreman-proxy/proxy.log
2025-01-24T14:11:11 1ba135ab [I] Started GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status
2025-01-24T14:11:11 1ba135ab [I] Finished GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status with 404 (2.47 ms)
2025-01-24T14:11:13 cc59fd2a [I] Started GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status
2025-01-24T14:11:13 cc59fd2a [I] Finished GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status with 404 (1.17 ms)
2025-01-24T14:11:14 8189b618 [I] Started GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status
2025-01-24T14:11:14 8189b618 [I] Finished GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status with 404 (1.34 ms)
2025-01-24T14:11:16 41cfd991 [I] Started GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status
2025-01-24T14:11:16 41cfd991 [I] Finished GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status with 404 (1.09 ms)
2025-01-24T14:11:18 bc8cc65a [I] Started GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status
2025-01-24T14:11:18 bc8cc65a [I] Finished GET /dynflow/tasks/d4e4d679-29e6-445c-b115-cd077f624b79/status with 404 (1.17 ms)
repeated....

Do you want me to start a new thread for this?

It does, but there’s way too many subtle differences that can add up in the end.

Please do

Continued here - GET /dynflow/tasks/<taskid>/status with 404 & Proxy task gone missing from the smart proxy