Foreman web server is under heavy load and response is very slow

Problem:

  1. we have around ~2600 systems registered to Foreman through multiple smart proxies (around 11 proxies) and this host count will grow. As we increased the hosts count from 1200 to 1800, we started to see problems with web server crashing very often and we fixed it by using the below values in ‘/etc/httpd/conf.modules.d/prefork.conf’ & '/etc/httpd/conf.d/passenger.conf ’

sh-4.2# egrep ‘ServerLimit|MaxClients|StartServers’ /etc/httpd/conf.modules.d/prefork.conf
StartServers 10
ServerLimit 1024
MaxClients 1024
MaxRequestWorkers 1024
sh-4.2#


sh-4.2# egrep -i ‘PassengerMaxPoolSize|queue’ /etc/httpd/conf.d/passenger.conf
PassengerMaxPoolSize 24
PassengerMaxRequestQueueSize 2000
sh-4.2#


  1. However we are still seeing slowness in the Foreman server like accessing GUI, package uploads etc. Below is a snippet from top command

top - 19:13:59 up 1 day, 5:53, 5 users, load average: 5.07, 5.31, 5.42
Tasks: 1386 total, 2 running, 1209 sleeping, 0 stopped, 0 zombie
%Cpu(s): 23.3 us, 4.1 sy, 0.0 ni, 67.9 id, 2.2 wa, 0.0 hi, 2.5 si, 0.0 st
KiB Mem : 65950736 total, 842380 free, 20661416 used, 44446940 buff/cache
KiB Swap: 33554428 total, 33551600 free, 2828 used. 43884216 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
39464 tomcat 20 0 13.9g 3.6g 19800 S 96.0 5.7 107:19.73 java
4625 foreman 20 0 1468212 524700 7028 S 54.1 0.8 2:58.03 ruby
3601 foreman 20 0 1477428 522896 7072 S 53.8 0.8 3:03.86 ruby
124394 foreman 20 0 1473844 537216 8580 S 53.5 0.8 3:44.54 ruby
124776 foreman 20 0 1464628 517812 6924 S 53.1 0.8 3:38.38 ruby
128789 foreman 20 0 1470260 519040 6924 S 53.1 0.8 3:29.02 ruby
42388 foreman 20 0 1203508 322652 6804 S 52.8 0.5 0:04.69 ruby
39664 root 20 0 1182544 61048 5952 S 12.5 0.1 15:24.11 PassengerHelper
42424 postgres 20 0 774836 32648 28816 D 5.6 0.0 0:00.48 postgres
4658 postgres 20 0 777788 168984 162632 D 5.3 0.3 0:17.55 postgres
124429 postgres 20 0 778324 273996 267232 S 5.3 0.4 0:22.27 postgres
124806 postgres 20 0 778572 281520 274500 S 5.3 0.4 0:22.20 postgres
128822 postgres 20 0 777620 172972 167032 D 5.3 0.3 0:20.52 postgres
3634 postgres 20 0 777644 170436 164656 S 5.0 0.3 0:18.07 postgres
62884 postgres 20 0 782948 137144 127344 S 3.0 0.2 0:07.22 postgres
23871 postgres 20 0 783048 90952 79940 S 2.3 0.1 0:01.79 postgres
52748 postgres 20 0 782228 133832 122708 S 1.3 0.2 0:06.21 postgres
23872 postgres 20 0 782772 87900 77788 S 1.0 0.1 0:01.72 postgres
39107 mongodb 20 0 4779268 3.4g 22612 S 1.0 5.5 4:07.86 mongod
50229 postgres 20 0 779936 135092 126516 S 1.0 0.2 0:07.18 postgres
39213 apache 20 0 701144 84404 19932 S 0.7 0.1 0:32.33 celery
39224 apache 20 0 701052 84500 19848 S 0.7 0.1 0:26.88 celery
39248 apache 20 0 701108 86724 20032 S 0.7 0.1 0:27.14 celery


  1. Observed there are so many HTTP connections and i guess these are coming from smart proxies when using GET RHSM. Would like to know if there is a way to restrict the number of HTTP connections or it is a default behavior.

sh-4.2# ps -ef|grep -i http|wc -l

965


  1. passenger status

sh-4.2# passenger-status
Version : 4.0.53
Date : 2020-05-27 19:24:12 -0700
Instance: 39632
----------- General information -----------
Max pool size : 24
Processes : 6
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
App root: /usr/share/foreman
Requests in queue: 866

  • PID: 52318 Sessions: 1 Processed: 9996 Uptime: 8m 41s
    CPU: 44% Memory : 424M Last used: 0s ago
  • PID: 53405 Sessions: 1 Processed: 9901 Uptime: 8m 30s
    CPU: 44% Memory : 425M Last used: 1s ago
  • PID: 90400 Sessions: 1 Processed: 2394 Uptime: 2m 1s
    CPU: 46% Memory : 381M Last used: 0s ago
  • PID: 92828 Sessions: 1 Processed: 1952 Uptime: 1m 33s
    CPU: 49% Memory : 369M Last used: 1s ago
  • PID: 95334 Sessions: 1 Processed: 1371 Uptime: 1m 5s
    CPU: 50% Memory : 368M Last used: 1s ago
  • PID: 96821 Sessions: 1 Processed: 1156 Uptime: 54s
    CPU: 47% Memory : 162M Last used: 0s ago

Thanks!

Expected outcome: Reduce the load is any tuning needs to be implemented to bring back Foreman to normal behavior.

Foreman and Proxy versions:
Foreman 1.23.2
Katello-3.13.4

Foreman and Proxy plugin versions:

Distribution and version:

Other relevant data:

I suggest to start with lowering down amount of checking by RHSM: https://linux.die.net/man/8/rhsmcertd

If you use puppet, that can put also huge stress on the system, do the same for puppet agents.

If you’re using Puppet, also avoid ensure => latest on package resources. When using subscription-manager it triggers a fact update on every run which creates a huge load on your systems.

Recent Katello versions also have a tuning option: Foreman :: Plugin Manuals

This will also configure PostgreSQL.

Out of curiosity, every run of what? yum ?

I don’t know the details. Maybe it’s every Puppet run or every package resource that has ensure => latest. I just know that the following is something you want to avoid:

package { 'mypackage':
  ensure => latest,
}

I’d always recommend installed, present or an explicit version and leave updating as a separate workflow. Not just because of the server load, but also because some update might sneak through. Personally I have bad memories when EPEL updated from Munin 1.4 to Munin 2.0 and our Puppet manifests updated during the night. That was a bad morning to coming into the office :wink:

Noticed we have a job on the clients that does yum install/update on around 100 packages on a hourly basis, which seem to be initiating HTTP connections to the Smart Proxy and from there to master Foreman via katello reverse proxy for every yum transaction. I’m guessing this would be causing huge list of HTTP connections on Foreman which keeps the server busy.

We are planning to run those yum commands with --disableplugin="*" or --noplugins option, which i guess will stop loading the below plugins for these yum transactions, hence no extra HTTP connections made to the Foreman. Just wanted to run this by you and get your opinion.

Loaded plugins: enabled_repos_upload, langpacks, package_upload, product-id, search-disabled-repos, subscription-manager, tracer_upload

And also wanted to know what will be the impact if we disable all these plugins permanently on the Clients.

Thanks!

Thank you @lzap and @ekohl for the pointers!!

We are able to fix the issue by bringing down the count of HTTP connections.