Exception on some Tasks hat run Ansible Roles

simon · May 21, 2021, 8:36am

Problem:
After I’ve finally got around to update my foreman instance from 1.22 to 2.4 my scheduled Ansible run that runs every evening on about ~50 hosts, a few servers fail the task because of an exception.

If I click through the failed tasks I get the following errors on the few hosts that fail:
ERF42-1582 [Foreman::Exception]: The smart proxy task 1bda65c8-60cc-41f5-9310-37fb3c0b743e failed.
Failed to initialize: Foreman::Exception - ERF42-4055 [Foreman::Exception]: ERF42-8492 [Foreman::Exception]: The smart proxy task 298369ad-5bae-4e33-a7d6-4701ccda7f5c failed.

So far I haven’t seen a pattern what hosts get this error.

Expected outcome:
No errors due to Foreman Exceptions

Foreman and Proxy versions:
Foreman: 2.4.0
Foreman-Proxy: 2.4.0
(both on the same host)
Foreman and Proxy plugin versions:
Foreman:
foreman-tasks 4.0.1
foreman_ansible 6.2.0
foreman_bootdisk 17.0.2
foreman_discovery 17.0.0
foreman_hooks 0.3.17
foreman_remote_execution 4.3.0
foreman_templates 9.0.2
Foreman Proxy:
Ansible 3.0.1
Discovery 1.0.5
Dynflow 0.3.0
Registration 2.4.0
SSH 0.3.1
Distribution and version:
CentOS 7.9
Other relevant data:
Errors from the foreman task

aruzicka · May 24, 2021, 7:43am

When you run an ansible job, tha actual execution happens on a smart proxy. So foreman just tells the proxy to execute the job and then keeps tabs on it in a ok/not-ok fashion. The error you’re getting means the execution on the proxy failed. You will need to check /var/log/foreman-proxy/proxy.log and /var/log/foreman-proxy/smart_proxy_dynflow_core.log to see what is actually going on.

simon · May 25, 2021, 8:55am

Hi, thanks for the reply. I’ve looked into the logs, and it seems to me that sometimes the proxy throws the error 413 Payload Too Large. I’ve also uploaded some logs, so you can look at it yourself if you want Link to Logs. Any idea how to fix this on my side? I could maybe extend the SSLRenegBufferSize in the Apache config.

aruzicka · May 25, 2021, 2:19pm

This is the first time I’m seeing this. Could you also post a bit of /var/log/foreman/production.log and /var/log/httpd/foreman* so we can pinpoint if it is apache or rails who determines the payload is too large?

simon · May 25, 2021, 3:43pm

Here are some further logs: LINK

I remembered that before the update I had the 413 Errors too but I “fixed it” with the following lines in the /etc/httpd/conf.d/05-foreman-ssl.conf. I think it has something to do with the ansible-callback maybe.

  <LocationMatch /api>
    SSLRenegBufferSize 231072
  </LocationMatch>

I’ve added that lines again, let’s see if my scheduled run has still some failing hosts tomorrow.

simon · May 27, 2021, 10:00am

Well the edit in the httpd conf only helped partially. I now get less 413 Errors at the end of ansible plays but they haven’t disappeared completely. Also I figured out that the same hosts fail on my daily run with that exception. The ansible role they play fails, but that should not result in an exception, when run individually they don’t result in an exception. I’ve collected some more logs from the tasks but the errors looks to be the same: Link. Also here are some more logs from the httpd/foreman production log Link. I had some errors uploading the production log so I cut out the error. That was IMO the only interesting thing inside of that log.