Problem:
Running a remote execution job, the jobs that run on one of the smart proxy return messages like this
Initialization error: RestClient::NotFound - 404 Not Found
Initialization error: RestClient::NotFound - 404 Not Found
Initialization error: RestClient::NotFound - 404 Not Found
Initialization error: RestClient::NotFound - 404 Not FoundError loading data from proxy: NoMethodError - undefined method `code' for "404 Not Found":String
Did you mean? encode
Looking at the Smart Proxy logs, I see messages like this Finished GET /dynflow/tasks/status with 404
Expected outcome:
Job should run normally and output should be shown in console
Foreman and Proxy versions:
Foreman 3.1.0 and after upgrading the affected smart proxy is 3.1.1, but this looked like it was happening even on 3.1.0
2022-01-27T09:28:48 [D] Rack::Handler::WEBrick is invoked.
2022-01-27T09:28:48 49c831df [I] Started GET /dynflow/tasks/count state=running
2022-01-27T09:28:48 49c831df [D] verifying remote client foreman against trusted_hosts [“foreman”, “foreman-proxy-mdw”]
2022-01-27T09:28:48 49c831df [I] Finished GET /dynflow/tasks/count with 200 (1.03 ms)
2022-01-27T09:28:48 [D] close: foreman:55984
2022-01-27T09:28:49 [D] accept: foreman:56002
2022-01-27T09:28:49 [D] accept: foreman:56004
2022-01-27T09:28:49 [D] Rack::Handler::WEBrick is invoked.
2022-01-27T09:28:49 49c831df [I] Started GET /dynflow/tasks/status
2022-01-27T09:28:49 49c831df [D] verifying remote client foreman against trusted_hosts [“foreman”, “foreman-mdw”]
2022-01-27T09:28:49 49c831df [I] Finished GET /dynflow/tasks/status with 404 (0.52 ms)
2022-01-27T09:28:49 [D] Rack::Handler::WEBrick is invoked.
2022-01-27T09:28:49 49c831df [I] Started GET /dynflow/tasks/status
2022-01-27T09:28:49 49c831df [D] verifying remote client foreman against trusted_hosts [“foreman”, “foreman-mdw”]
2022-01-27T09:28:49 49c831df [I] Finished GET /dynflow/tasks/status with 404 (0.48 ms)
2022-01-27T09:28:49 [D] close: foreman:56002
2022-01-27T09:28:49 [D] close: foreman:56004
If the logs are complete then there is no POST to /dynflow/tasks/launch, which would mean the job is never triggered on the proxy in question. Do you have several proxies behind a load balancer by any chance?
I only have 2 smart proxies, the “main” one where foreman runs itself, and a “remote” smart proxy in another location. It seems like only jobs that use the remote smart proxy are affected by this. The regular foreman smart proxy works fine.
Also it seems like this only happens when I run a job with multiple hosts, and Foreman is trying to do some load balancing with both of the smart proxies. Only the hosts that use the remote proxy return the 404 error.
If I run a job on a single host that uses the remote proxy, then it works fine. Almost like the parent task is having trouble being passed down when there are multiple hosts using the remote smart proxy
Could you bump log level to debug on both proxies and run a job with many hosts so they go through both proxies and then upload logs from both proxies so we cant cross reference them and establish a timeline?
2022-02-08T05:02:42 [W|bac|4ac20229] Could not trigger task on the smart proxy: undefined method `[]' for nil:NilClass
Which leads us here[1]. The failure seems to happen before it tries to reach out to the remote proxy, but there aren’t all that many places where it could go wrong and I can’t really see where a rogue nil could sneak in within the normal operating conditions. We’d definitely need better logging in this area, would you be willing to try a patch[2]? I’m failing to reproduce it locally
Correct, although I usually apply whole patches instead of changing things by hand.
cd /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-5.2.1
# Commit the source from rpm into a git repository so we can go back if things go south
git init
git add .
git commit -m rpm
git tag rpm
# Normally it would be just a matter of
# curl -L https://github.com/theforeman/foreman-tasks/pull/671.patch | git am
# but the PR doesn't apply cleanly against 5.2.1
# Here's a 5.2.1 compatible version of the same fix
curl -L https://github.com/theforeman/foreman-tasks/compare/5.2.x...adamruzicka:3.1/error-logging.patch | git am
# Restart all services which load that code
systemctl restart foreman 'dynflow-sidekiq@*'
# If things go awry we can just
# git reset --hard rpm
# rm -rf .git
It fails when processing a response from the remote proxy, but the remote proxy shows no sign of receiving the request and responding to it.
So either it hits some other host/service, but it still replies somewhat successfully, or it hits the right proxy but for some reason the proxy doesn’t log it.
Refreshing features on the remote proxy works as expected? What versions of smart proxy plugins do you have on both the proxies?
Yes, refreshing features works on both smart proxies. I tried to bump up the version of the remote proxy to 3.1.1 but same issue was still occurring on older 3.1.0 version
And yea, remote proxy will work if its just a single host. Only has an issue when there is more than one host.
@aruzicka I’m also affected by this issue. I’m using Foreman 3.1.1 with updated smart proxies on CentOS7. I’m experiencing roughly the same issue as the OP. I’m using the internal capsule of Foreman as well as a external smart proxy. When I run a bulk task with 5-10 hosts included in the remote execution job, there’s a few that will execute successfully but the remaining will result in the below error
Initialization error: RestClient::NotFound - 404 Not Found
Initialization error: RestClient::NotFound - 404 Not Found
Initialization error: RestClient::NotFound - 404 Not Found
Initialization error: RestClient::NotFound - 404 Not FoundError loading data from proxy: NoMethodError - undefined method `code' for "404 Not Found":String
Did you mean? encode
foreman-proxy
2022-02-08T20:47:00 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:00 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.63 ms)
2022-02-08T20:47:15 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:15 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.67 ms)
2022-02-08T20:47:15 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:15 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.53 ms)
2022-02-08T20:47:30 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:30 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.47 ms)
2022-02-08T20:47:30 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:30 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.47 ms)
2022-02-08T20:47:45 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:45 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.49 ms)
2022-02-08T20:47:45 ac60d09f [I] Started GET /dynflow/tasks/status
2022-02-08T20:47:45 ac60d09f [I] Finished GET /dynflow/tasks/status with 404 (0.45 ms)
Sorry about the delay, I haven’t forgotten about this, I was just out sick for a couple of days. Now that I’m back, I managed to put together a reproducer and once I had it, fixing it was easy.
It is being tracked as [1], with patches being opened for nightly[2] and 3.1-compatible[3].
I still seem to be plagued by this issue. I’m running Foreman 3.2.0, same version on the smart proxies … this issue is sort of preventing me from moving forward with the integration of Foreman in the environment I’m working in … I decided to try to avoid this issue by using a single smart proxy, but I still seem to have the same issues as before. I can get away with running a batch job of 15 or so hosts, but if trying to run a job on every server in the environment (currently only 170 or so hosts), about 60% of the hosts are failing like this:
31 :Error loading data from proxy: RestClient::NotFound - 404 Not Found
32: Error loading data from proxy: RestClient::NotFound - 404 Not Found
33: Error loading data from proxy: RestClient::NotFound - 404 Not Found
34: Error loading data from proxy: RestClient::NotFound - 404 Not Found
35: Error loading data from proxy: RestClient::NotFound - 404 Not Found
36: Error initializing command: RuntimeError - Unable to create directory on remote system /var/tmp/foreman-ssh-cmd-f8e1b393-d3c2-4d60-9775-b7c48ea2cfb8: exit code: 255
37:
38:
Exit status: EXCEPTION
The ssh error at the bottom is bizarre because if I run a ssh job on the individual hosts that are producing that error outside of a large batch job, I’m able to successfully perform remote execution using the same exact smart proxy. Has this change made it into the 3.2.0 release yet, or am I looking at a different issue?
The fix landed in foreman-tasks-6.0.1, which is in 3.2 repos so if you’re on that version, I’d be inclined to say what you’re hitting is a different issue. Could you start a new thread and attach logs (/var/log/foreman/production.log from foreman, /var/log/foreman-proxy/proxy.log from the proxy, /var/log/secure from one of the target hosts)?
So the updates with the foreman-tasks fixes wasn’t included in the initial release of 3.2.0 which confused me a bit, it wasn’t included into the repository until April 7th or something like that. I performed a update in my environment for foreman and it’s proxies and the issue now seems to be resolved and is reporting more clear errors if something were to go wrong with remote executions. Appreciate the help, if in fact I do have more issues relating to this I’ll let you know.
2025-01-24T13:41:04 776caca3 [I] Started GET /dynflow/tasks/9dfe1c7b-de67-4f9f-8498-c7b0b033bf5c/status
2025-01-24T13:41:04 776caca3 [I] Finished GET /dynflow/tasks/9dfe1c7b-de67-4f9f-8498-c7b0b033bf5c/status with 404 (2.52 ms)
2025-01-24T13:41:06 532c5c6c [I] Started GET /dynflow/tasks/9dfe1c7b-de67-4f9f-8498-c7b0b033bf5c/status
2025-01-24T13:41:06 532c5c6c [I] Finished GET /dynflow/tasks/9dfe1c7b-de67-4f9f-8498-c7b0b033bf5c/status with 404 (1.13 ms)
2025-01-24T13:41:07 a4b43d84 [I] Started GET /dynflow/tasks/9dfe1c7b-de67-4f9f-8498-c7b0b033bf5c/status
Interestingly the task is in dynflow just sitting there.
134:
Error loading data from proxy: RestClient::NotFound - 404 Not Found
135:
Error loading data from proxy: RestClient::NotFound - 404 Not Found
136:
Actions::ProxyAction::ProxyActionMissing: Proxy task gone missing from the smart proxy
We can rerun it and sometimes it’ll just work again.
Again, the production logs dont seem to have much, definitely not seeing the error above of: Could not trigger task on the smart proxy
Any assistance would be appreciated, so we can make sure tasks/remote jobs run.