Ansible: Always get Failed status for jobs and roles

DaniuS · September 16, 2020, 1:47pm

Problem:
Good day, Guys!

I’m using Foreman for couple of months.
In Foreman I use Ansible roles and Ansible playbooks as well:
=> Host => Schedule remote job =>Ansible playbook => Ansible - Run playbook

Each time I do this I receive “Failed” in the Job status.
Even if Ansible Role/Playbook works well.

Could you, please, help me to solve this issue?

Also, Guys, is there any way to run playbooks which are alredy present in yml files on same server (e.g. located in ~/Ansible/ folder)?

Expected outcome:
Fix problem with Ansible jobs status.

Foreman and Proxy versions:
Version 1.24.3 © 2009-2020 Paul Kelly and Ohad Levy
Instance 1ff98334-1042-49a4-ae5c-dc141c65ccf5

Foreman and Proxy plugin versions:
Ansible - Version3.0.1
DHCP - Version1.24.3
Dynflow - Version0.2.4
HTTPBoot - Version1.24.3
SSH - Version0.3.0
TFTP - Version1.24.3

Distribution and version:
CentOS Linux release 7.8.2003 (Core)

Other relevant data:

aruzicka · September 17, 2020, 6:18am

Hi,
when you try running a playbook there’s a table with hosts at the bottom of the page. If you click on the host’s name, it will take you to a page showing the output of the job on that host. It is really hard to guess what went wrong just from what you said, but that page should give you more insight into it

DaniuS · September 17, 2020, 8:56am

Hi,

According to job output - everything is ok.
Tasks run correctly. But job status is always the same…

Is there any way to capture detailed logs for such tasks?
To investigate and fix?

aruzicka · September 17, 2020, 9:22am

Ah, for some reason I thought that playbooks/roles run fine outside of foreman.

In the job details there’s “job task” button, on the page where it takes you click “sub tasks”, there pick a single task and look around, hopefully you’ll find something useful there.

On a side note, does it take like 10 minutes for the job to run even if it does almost nothing?

DaniuS · September 17, 2020, 9:34am

No, jobs works well. Quite fast.

DaniuS · September 17, 2020, 9:44am

I will run some simple task and will take a screenshot for it. Hope it will help.

DaniuS · October 1, 2020, 2:10pm

Guys,

I want to provide small update regarding problem with Ansible jobs status:
I have same for Ansible Roles as well.

New Ansible Role was created and assigned to host.
Then in host menu I run “Run Ansible Role” command.
In the output window - everything is ok (Failed=0, Skipped=0).

But in Jobs status, after several minutes I have Run Ansible Role Failed.

Also, I have next message:
Failed to initialize: NoMethodError - undefined method `’ for nil:NilClass
And, also, a bunch of errors in Jobs Status.

Could you, please, help me in solving of this problem?

aruzicka · October 1, 2020, 2:52pm

Any more details around this? A stacktrace maybe? Where does it come from?

All the other screenshots don’t really say anything as this error is exactly the same every time running something on any host in the job fails.

DaniuS · October 1, 2020, 3:02pm

Unfortunately, I don’t understand clearly.
I go to Monitor=>Jobs=>Press on job description=>Press on hostname (see below):

DaniuS · October 1, 2020, 3:10pm

Also, I’ve found description of similar problem here:
https://projects.theforeman.org/issues/29028

According to this article problem was solved in module foreman-tasks-1.1.0.
As I understand correctly, this module is for Foreman 2.0+, but I use 1.24.3 and have another version of foreman-tasks…

aruzicka · October 1, 2020, 3:15pm

From the issue you linked:

About 10 minutes later, the RunHostJob should failed with the above error.

From what you stated earlier:

No, jobs works well. Quite fast.

So which is it?

I go to Monitor=>Jobs=>Press on job description=>Press on hostname (see below):

In there, press task details, there press dynflow console and try clicking around in there. Also production.log from around that time could be useful

DaniuS · October 1, 2020, 3:39pm

In my case - Role was applied quickly (I saw the progress in output windows),
but yes - information about job in status updates slowly.

Dynflow - see below:

Sure, will check production.log

aruzicka · October 1, 2020, 3:54pm

Ah, so that’s what I originally had in mind.

Is is usually a misconfiguration, either wrong hostname or ssl certs in /etc/smart_proxy_dynflow_core/settings.yml.

What is happening is:

A job is run
Job gets delegated to the smart proxy and smart proxy dynflow core
Smart proxy dynflow core runs the job (runs the actual ansible command)
When the job is done, smart proxy dynflow core tries to call back to foreman and this request fails
After 10 or so minutes, foreman checks the status on the smart proxy, sees the task there is failed and fails the job

Please note that the issue you linked is only a symptom, even if you had that patch, the jobs would still fail at step 4, you would just get a different error after 10 minutes.

Also, isn’t 1.24.3 already EOL?

DaniuS · October 1, 2020, 5:07pm

Looks like I’ve found one issue.
Smart-Proxy tries to connect to localhost:3000, but gets “Connection refused”…just because nobody here listen this port:

Failed to open TCP connection to kh0dl1000000075.dtc.dish.corp:3000 (Connection refused - connect(2) for “kh0dl1000000075.dtc.dish.corp” port 3000) (Errno::ECONNREFUSED)

netstat -ant | grep 3000

Could you, please, tell me, which service should listen on port :3000 and how to enable it?

DaniuS · October 1, 2020, 5:22pm

I’ve found that two services are disabled:

foreman-cockpit.service
foreman.service

As I understand - Foreman-Cockpit listens on :3000 and it was enabled.
Do I need to enable foreman.service as well?

DaniuS · October 1, 2020, 5:41pm

Problem with port :3000 was solved.
But still no progress with Jobs status.
Currently I have next error messages:

403 Forbidden (RestClient::Forbidden) - during next:
127.0.0.1 - - [01/Oct/2020:20:30:46 EEST] “GET /tasks/542d07ff-26a8-4a10-861c-4465f9bde13c/status? HTTP/1.1” 200 7316
403 Forbidden (RestClient::Forbidden)

aruzicka · October 2, 2020, 8:21am

Problem with port :3000 was solved.

How?

Smart-Proxy tries to connect to localhost:3000, but gets “Connection refused”…just because nobody here listen this port:

localhost:3000 is the default, most likely it is not configured to fit your environment. It it is a production deployment, then it should be fqdn of the foreman machine and port 443.

Currently I have next error messages:

Where?

DaniuS · October 2, 2020, 8:31am

Good day!

Problem with port :3000 was solved by starting of foreman.service:
systemctl enable --now foreman.service
I see “403 Forbidden (RestClient::Forbidden)” in the next log:
/var/log/foreman-proxy/smart_proxy_dynflow_core.log
I will try to update /etc/smart_proxy_dynflow_core/settings.yml with FQDN:443 and will test.

DaniuS · October 2, 2020, 9:09am

I’ve changed settings in /etc/smart_proxy_dynflow_core/settings.yml to FQDN:443 and run task again.
Currently I have next right after task starts in /var/log/foreman-proxy/smart_proxy_dynflow_core.log:
127.0.0.1 - - [02/Oct/2020:11:58:25 EEST] “POST /tasks/launch? HTTP/1.1” 200 110
127.0.0.1 - - [02/Oct/2020:11:58:26 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 6216
127.0.0.1 - - [02/Oct/2020:11:58:26 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 6216
127.0.0.1 - - [02/Oct/2020:11:58:28 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 6663
127.0.0.1 - - [02/Oct/2020:11:58:29 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 6768
127.0.0.1 - - [02/Oct/2020:11:58:30 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 7158
127.0.0.1 - - [02/Oct/2020:11:58:31 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 7307
127.0.0.1 - - [02/Oct/2020:11:58:33 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 7307
127.0.0.1 - - [02/Oct/2020:11:58:34 EEST] “GET /tasks/7a857992-c4e4-4255-a763-f76c2e106f83/status? HTTP/1.1” 200 7307
400 Bad Request (RestClient::BadRequest)

aruzicka · October 2, 2020, 9:12am

On 1.24 you shouldn’t have that running.

Guess you’ll have to check production.log, it might tell you what was wrong with the request.

Also how have you deployed this instance? Usually the installer sets everything up so it works out of the box