Foreman 3.0 and Katello 4.2 memory issue

kimpsik · October 8, 2021, 5:49am

Problem:
When i used Foreman 2.5 and Katello 4.1.3 i had no memory issue. It was very stable. After i updated Katello 4.1.4 the memory problem began. It went up and i was getting messages: "Linux: OOM killed a process. Out of memory: Kill process 2979 (ruby). So i decided to upgrade Foreman and Katello to the latest version and hoped that it will solve my issues. But unfortunately still memory usage is increasing and its not stable.
Here is a memory usage picture after the upgrade:

Top memory usage processes:

 PID  PPID %MEM %CPU CMD
 5187  4945 21.0  0.2 puma: cluster worker 0: 4945 [foreman]
 4909     1  8.4  0.6 /usr/lib/jvm/jre-11/bin/java -Xms1024m -Xmx4096m -Djava.security.auth.login.config=/usr/share/tomcat/conf/login.config -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start
 4997     1  6.3  0.2 /usr/bin/java -Xms2G -Xmx2G -Djruby.logger.class=com.puppetlabs.jruby_utils.jruby.Slf4jLogger -XX:ReservedCodeCacheSize=512m -XX:OnOutOfMemoryError=kill -9 %p -cp /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar:/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/facter.jar:/opt/puppetlabs/server/data/puppetserver/jars/* clojure.main -m puppetlabs.trapperkeeper.main --config /etc/puppetlabs/puppetserver/conf.d --bootstrap-config /etc/puppetlabs/puppetserver/services.d/,/opt/puppetlabs/server/apps/puppetserver/config/services.d/ --restart-file /opt/puppetlabs/server/data/puppetserver/restartcounter
 5608     1  6.3  0.4 sidekiq 5.2.7  [0 of 5 busy]
 5190  4945  5.1  0.1 puma: cluster worker 1: 4945 [foreman]
 5202  4945  4.7  0.1 puma: cluster worker 4: 4945 [foreman]
 5194  4945  4.4  0.1 puma: cluster worker 3: 4945 [foreman]
 5203  4945  4.1  0.1 puma: cluster worker 5: 4945 [foreman]
 5192  4945  4.0  0.1 puma: cluster worker 2: 4945 [foreman]

My host has 20 GB memory and 4 CPU.
Any ideas or recommendation, what to do ?

Distribution and version:
Centos 7.9
Foreman 3
Katello 4.2

upadhyeammit · October 8, 2021, 10:04am

There are few changes with respect to increased memory consumption[1]. These are planned for 3.0.1,

As per the discussions of release meetings we are expecting to release 3.0.1 within a few weeks. You can use Release team meeting agenda page to track the discussions about same.

kimpsik · October 8, 2021, 10:37am

Thanks for the info @upadhyeammit
I have tried to investigate more my memory issue and i checked puma state. It seems, that puma is consumes a quite lot of memory if im correct. Or does it look normal ?

evgeni · October 9, 2021, 12:48pm

1GB/worker isn’t too far off from what we see across the field.

The one with 5G is a bit much tho.

Is that with or without the patches Amit posted above?

evgeni · October 9, 2021, 12:54pm

And CC @jeremylenz and @Justin_Sherrill as Katello release managers, and @Jonathon_Turel as the one who had the most insight in the recent memory topics

Jonathon_Turel · October 9, 2021, 2:53pm

I’ve yet to come across a scenario where a single puma worker has more than 5x the memory consumption of the other ones, so that is interesting.

@kimpsik can you share the output of hammer ping from your server? It contains useful information such as how many events the background workers have handled and I’d like to see if the volume is especially high. Those background workers run only in a single puma process at any given time.

kimpsik · October 9, 2021, 4:49pm

@evgeni without patches.

@Jonathon_Turel Here is the requested output of hammer ping:

database:
    Status:          ok
    Server Response: Duration: 0ms
katello_agent:
    Status:          FAIL
    message:         Not running
    Server Response: Duration: 0ms
candlepin:
    Status:          ok
    Server Response: Duration: 21ms
candlepin_auth:
    Status:          ok
    Server Response: Duration: 17ms
candlepin_events:
    Status:          ok
    message:         59 Processed, 0 Failed
    Server Response: Duration: 0ms
katello_events:
    Status:          ok
    message:         2 Processed, 0 Failed
    Server Response: Duration: 1ms
pulp3:
    Status:          ok
    Server Response: Duration: 65ms
foreman_tasks:
    Status:          ok
    Server Response: Duration: 24ms

eightnoneone · October 10, 2021, 2:43am

I upgraded to 3.0/4.2.0.1.rc3 just yesterday and today came to find an unresponsive server.

# hammer ping katello
katello_agent:
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 0ms
candlepin:
    Status:          FAIL
    Server Response: Message: Failed to open TCP connection to localhost:23443 (Connection refused - connect(2) for "localhost" port 23443)
candlepin_auth:
    Status:          FAIL
    Server Response: Message: A backend service [ Candlepin ] is unreachable
candlepin_events:
    Status:          ok
    message:         153 Processed, 0 Failed
    Server Response: Duration: 0ms
katello_events:
    Status:          ok
    message:         5 Processed, 0 Failed
    Server Response: Duration: 0ms
pulp3:
    Status:          ok
    Server Response: Duration: 50ms
foreman_tasks:
    Status:          ok
    Server Response: Duration: 2ms

I did a foreman-maintain service restart and back up and running.
top view sorted on memory…

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1388 tomcat    20   0    9.9g   1.5g  30.4m S   0.7  7.4   2:05.05 java
 1559 puppet    20   0 7887.1m   1.1g  23.4m S   0.3  5.7   1:13.48 java
 2223 foreman   20   0 1591.0m 906.8m   6.1m S   0.0  4.5   0:17.69 ruby
 2189 foreman   20   0 1316.6m 674.2m   5.1m S   0.0  3.3   0:13.09 ruby
 2199 foreman   20   0 1308.8m 650.3m   5.1m S   0.0  3.2   0:10.11 ruby
 2178 foreman   20   0 1322.3m 640.2m   5.2m S   0.0  3.1   0:09.75 ruby
 2228 foreman   20   0 1323.4m 633.5m   5.1m S   0.0  3.1   0:07.77 ruby
 2196 foreman   20   0 1328.3m 627.3m   5.1m S   0.0  3.1   0:07.97 ruby
 2208 foreman   20   0 1325.6m 612.3m   5.1m S   0.0  3.0   0:06.69 ruby
 2217 foreman   20   0 1323.9m 601.7m   5.1m S   1.0  3.0   0:05.54 ruby
 2182 foreman   20   0 1319.8m 532.7m   5.2m S   0.0  2.6   0:03.66 ruby
 2211 foreman   20   0 1306.6m 531.5m   5.0m S   0.0  2.6   0:03.45 ruby
 2203 foreman   20   0 1263.1m 497.7m   6.1m S   0.0  2.4   0:03.68 ruby
 2195 foreman   20   0  814.1m 357.5m   5.1m S   1.3  1.8   0:00.71 ruby
 1447 foreman   20   0  747.4m 351.1m   7.1m S   0.3  1.7   0:25.50 sidekiq
 2475 foreman   20   0  754.3m 350.3m   7.1m S   0.0  1.7   0:15.74 sidekiq
 2472 foreman   20   0  755.2m 347.3m   7.1m S   0.3  1.7   0:15.75 sidekiq
 1449 foreman   20   0  725.9m 336.9m   7.1m S   0.0  1.7   0:24.83 ruby
 1387 pulp      20   0  664.6m 115.7m  11.2m S   2.0  0.6   0:07.18 gunicorn
 1405 pulp      20   0  664.6m 115.7m  11.2m S   1.7  0.6   0:09.61 gunicorn
 1420 pulp      20   0  664.3m 115.7m  11.2m S   0.0  0.6   0:05.97 gunicorn
 1400 pulp      20   0  664.3m 115.6m  11.2m S   0.3  0.6   0:05.56 gunicorn
 1412 pulp      20   0  664.3m 115.4m  11.2m S   0.0  0.6   0:05.05 gunicorn
 1346 pulp      20   0  664.0m 115.4m  11.2m S   0.0  0.6   0:04.98 gunicorn
 1359 pulp      20   0  664.0m 115.3m  11.2m S   0.0  0.6   0:04.83 gunicorn
 1407 pulp      20   0  664.0m 115.2m  11.2m S   0.0  0.6   0:04.75 gunicorn
 1348 pulp      20   0  663.7m 115.2m  11.2m S   0.0  0.6   0:04.74 gunicorn
 1343 pulp      20   0  505.5m 115.1m  11.3m S   0.0  0.6   0:04.53 gunicorn
 1427 pulp      20   0  663.7m 115.1m  11.2m S   0.0  0.6   0:04.70 gunicorn

Jonathon_Turel · October 11, 2021, 2:16pm

@kimpsik that looks OK but I am curious about the katello_agent part. Are you using katello-agent to manage packages on your Content Hosts?

If not, I recommend running the installer with --foreman-proxy-content-enable-katello-agent=false to remove the supporting infrastructure from your server.

If you do use it, then something is not right as you can see (this is likely separate from the memory issues). Can you check the status of systemd services qpidd and qdrouterd on your server?

kimpsik · October 11, 2021, 7:49pm

@Jonathon_Turel Thanks for the the advice, i removed the support, because i dont use katello-agent at the moment.

Here are the systemd outputs, hopefully they are suitable.
I noticed that the qpidd service output seems wierd… contains some errors.

● qpidd.service - An AMQP message broker daemon.
   Loaded: loaded (/usr/lib/systemd/system/qpidd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/qpidd.service.d
           └─90-limits.conf, wait-for-port.conf
   Active: active (running) since Fri 2021-10-08 16:13:56 EEST; 3 days ago
     Docs: man:qpidd(1)
           http://qpid.apache.org/
 Main PID: 4480 (qpidd)
    Tasks: 6
   Memory: 20.4M
   CGroup: /system.slice/qpidd.service
           └─4480 /usr/sbin/qpidd --config /etc/qpid/qpidd.conf

Oct 11 22:41:37 foreman qpidd[4480]: 2021-10-11 22:41:37 [System] error Error reading socket: Encountered end of file [-5938]
Oct 11 22:41:37 foreman qpidd[4480]: 2021-10-11 22:41:37 [System] error Error reading socket: Encountered end of file [-5938]
Oct 11 22:41:53 foreman qpidd[4480]: 2021-10-11 22:41:53 [System] error Error reading socket: Encountered end of file [-5938]
Oct 11 22:41:53 foreman qpidd[4480]: 2021-10-11 22:41:53 [System] error Error reading socket: Encountered end of file [-5938]
Oct 11 22:42:08 foreman qpidd[4480]: 2021-10-11 22:42:08 [System] error Error reading socket: Encountered end of file [-5938]

● qdrouterd.service - Qpid Dispatch router daemon
   Loaded: loaded (/usr/lib/systemd/system/qdrouterd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/qdrouterd.service.d
           └─90-limits.conf
   Active: active (running) since Thu 2021-10-07 15:20:54 EEST; 4 days ago
 Main PID: 4858 (qdrouterd)
    Tasks: 5
   Memory: 24.1M
   CGroup: /system.slice/qdrouterd.service
           └─4858 /usr/sbin/qdrouterd -c /etc/qpid-dispatch/qdrouterd.conf

Oct 11 14:00:51 foreman qdrouterd[4858]: SERVER (info) [C33] Accepted connection to :5647 from xx.xx.xx.xx:41328
Oct 11 14:00:51 foreman qdrouterd[4858]: ROUTER_CORE (info) [C33] Connection Opened: dir=in host=xx.xx.xx.xx:41328 vhost= encrypted=TLSv1...e props=
Oct 11 14:00:51 foreman qdrouterd[4858]: ROUTER_CORE (info) [C33][L67] Link attached: dir=out source={pulp.agent.330bbd2d-451e-49a0-ad0a-...re:sess}
Oct 11 17:52:59 foreman qdrouterd[4858]: SERVER (info) [C33] Connection from xx.xx.xx.xx:41328 (to :5647) failed: proton:io Connection re...n error)
Oct 11 17:52:59 foreman qdrouterd[4858]: ROUTER_CORE (info) [C33][L67] Link closed due to connection loss: del=0 presett=0 psdrop=0 acc=0...ocked=no
Oct 11 17:52:59 foreman qdrouterd[4858]: ROUTER_CORE (info) [C33] Connection Closed```

ehelms · October 11, 2021, 8:01pm

Running the installer to disable the katello-agent support should result in those services being turned off which makes me think the command you ran did not perform as expected. Can you check /etc/foreman/plugins/katello.yaml to see what it says for:

  :agent:
    :enabled:

kimpsik · October 11, 2021, 8:16pm

@ehelms My fault. I did copy the services outputs before i ran the command, which will disable the katello-agent support. Now both services cannot be found.

kimpsik · October 15, 2021, 12:15pm

Small update - after i disabled the katello agent support, the memory usage has been more stable and lower.
So big thanks to Foreman team!