HTTPS Error 502 - Bad Gateway during YUM update remote execution job

Cassius_Clay · February 11, 2021, 7:44pm

Problem:
This is a fresh installation that I have done some tuning for Tomcat and Apache, but think I’m still missing something.

When running a remote execution job on only 128 hosts, I receive an [Errno 14] HTTPS Error 502 - Bad Gateway error on about half. They can’t reach various repositories.

Expected outcome:
All systems should be able to connect and pull repository metadata and packages for installation.

Foreman and Proxy versions:
Foreman 3.18.1

Distribution and version:
CentOS 7.9

Other relevant data:
https://KATELLO_SERVER/pulp/repos/ORG/Library/centos-7-x86_64-cv/custom/Extra_Packages_for_Enterprise_Linux/Extra_Packages_for_Enterprise_Linux_7_x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 502 - Bad Gateway

Sever specs and Katello tuning profile:
32G RAM
8 vCPUs
Katello medium profile

sajha · February 11, 2021, 8:17pm

Have you tried accessing the repository through a client manually? I am not aware of any Apache tuning that’s needed to make it work. The installer should have deployed the config for you.

Cassius_Clay · February 11, 2021, 8:22pm

Yes, I can run a ‘yum -y update’ from the local client just fine. I can also run a remote job in chunks of 20-25 at a time which works fine.

It seems like there is something that is limiting a higher volume of systems from accessing the repos, pulp, Tomcat, or Apache at the same time.

I been watching the load on Katello application server during these jobs and there seems to be plenty of resources available from a CPU and memory point of view.

sajha · February 11, 2021, 8:25pm

@remote_execution - Do we have suggested limits on batch size for hosts to run remote execute on?

Cassius_Clay · February 11, 2021, 8:49pm

I found lots of error messages like this one for the failed hosts in /etc/httpd/logs/foreman-ssl_error_ssl.log.

[Thu Feb 11 14:34:37.496074 2021] [proxy_http:error] [pid 19786] (104)Connection reset by peer: [client IPADDRESS:47162] AH01102: error reading status line from remote server httpd-UDS:0
[Thu Feb 11 14:34:37.496145 2021] [proxy:error] [pid 19786] [client IPADDRESS:47162] AH00898: Error reading from remote server returned by /pulp/repos/ORG/Library/rhel-7-server-x86_64-cv/content/dist/rhel/server/7/7Server/x86_64/extras/os/repodata/repomd.xml

ezr-ondrej · February 11, 2021, 9:13pm

It strongly depends on the Smart Proxy and machine it runs one, but in general cases the default 100 works fine
But this doesn’t seem REX issue, but more like Pulp being trottlet by something.

And we have to note, that REX sends it in batches, but if SmartProxy has enough resources, it will run multiple batches, so the concurrent connection count to the pulp repos will not be equal to what we set in the batch size, but probably higher.

Cassius_Clay · February 11, 2021, 9:31pm

Just to test, I ran the same patching job on 128 client with a concurrency set to 25 and it still failed on 4 clients all with different repositories.

I believe that you are right. It seems something is throttling Pulp to only server 20 or so at time. I haven’t had this issues before on my old environment where are was able to send 300+ at a time which the normal concurrency level was able to manage just fine.

sajha · February 12, 2021, 5:10pm

@Cassius_Clay : Are you running pulp3 on the box? We have a couple of fixes that have gone in to tune this further and looks like you’re hitting the issues.

You could try the following on your system:
sudo vi /etc/systemd/system/pulpcore-content.service

In the line, ExecStart=/usr/libexec/pulpcore/gunicorn pulpcore.content:server
–worker-class ‘aiohttp.GunicornWebWorker’
-w 2
–access-logfile -
change number of workers from 2 to 8 with -w 8
sudo systemctl daemon-reload
sudo systemctl restart pulpcore-content.service

Cassius_Clay · February 12, 2021, 5:40pm

Yes, it appears that I am using Pulp3.

Backend System Status
Component Status Message
candlepin OK
candlepin_auth OK
foreman_tasks OK
katello_events OK 845 Processed, 0 Failed
candlepin_events OK 7652 Processed, 0 Failed
pulp3 OK
pulp OK
pulp_auth OK

I have updated the systemd file as noted and will give it a test today.

Thank you for your reply @sajha

Cassius_Clay · February 12, 2021, 8:53pm

@sajha that setting seemed to help.

I ran a ‘yum check-update’ across 284 client that had their yum cache cleared already and only 6 errored with the same "Error 52 - Bad Gateway’ message.

I did not try to change on concurrency and just went with whatever the default is so they ran as they queued.