Content proxy complete sync out-of-memory

Problem:

Because I had some 404 on the content proxy due to some deleted repositories, I started a complete sync for my content proxy. This, however, ended badly running out of memory on my content proxy vm which had 32 GB memory assigned. Even after a reboot, the task continued taking the server down again. I have increased memory to 64 GB and now at least it seems to be running so far. Still I can see that used memory goes up to ~50 GB at times.

ps showed that I have a couple of pulpcore-worker processes which are very busy and allocating 6-7 GB each at the peak times.

What I found confusing, though, is the number of pulpcore-worker processes: 17. This comes from the pulpcore-content.service unit file which has:

# systemctl cat pulpcore-content.service
# /etc/systemd/system/pulpcore-content.service
[Unit]
Description=Pulp Content App
Requires=pulpcore-content.socket
After=network.target
Wants=postgresql.service
...
ExecStart=/usr/bin/pulpcore-content \
          --preload \
          --timeout 90 \
          --workers 17 \
          --access-logfile -
...

It’s the same on my other content proxy and my main server. Now I had somehow in my mind it would be usually only 8 pulpcore-workers and not 17.

I have checked the answers file as well as foreman-installer full-help output:

  pulpcore_worker_count: 8

It’s set to 8 everywhere I look. So why does it start 17 workers if it’s only supposed to use 8? Or is that a different option? With 8 max I guess the 32 GB might have been enough…

Seeing my thread from September Content Proxy out of memory where I mention it’s only 8 workers, I start to think that there is something going wrong.

I think it’s here in github puppet-pulpcore/templates/pulpcore-content.service.erb at 7f81e6b9ae5cf033226d5b1dc4c0407b4fc566f2 · theforeman/puppet-pulpcore · GitHub

which is initialized with 17 if you have more than 8 CPUs on the server (which I do).

Expected outcome:
No OOM and I guess only 8 pulpcore-workers running.

Foreman and Proxy versions:
Running 3.13/4.15 with the current release.

foreman-installer-3.13.0-1.el9.noarch
foreman-installer-katello-3.13.0-1.el9.noarch
foreman-proxy-3.13.0-1.el9.noarch
foreman-proxy-content-4.15.0-1.el9.noarch
katello-certs-tools-2.10.0-1.el9.noarch
katello-client-bootstrap-1.7.9-2.el9.noarch
katello-common-4.15.0-1.el9.noarch
katello-host-tools-4.4.0-2.el9.noarch
katello-host-tools-tracer-4.4.0-2.el9.noarch
pulpcore-obsolete-packages-1.2.0-1.el9.noarch
pulpcore-selinux-2.0.1-1.el9.x86_64
python3.11-pulp-ansible-0.22.4-1.el9.noarch
python3.11-pulp-container-2.22.1-1.el9.noarch
python3.11-pulpcore-3.63.11-1.el9.noarch
python3.11-pulp-deb-3.5.1-1.el9.noarch
python3.11-pulp-glue-0.31.0-1.el9.noarch
python3.11-pulp-python-3.12.6-1.el9.noarch
python3.11-pulp-rpm-3.27.2-1.el9.noarch
rubygem-foreman_maintain-1.8.1-2.el9.noarch
rubygem-smart_proxy_pulp-3.4.0-1.fm3_13.el9.noarch

Distribution and version:
AlmaLinux 9

I can confirm that with 32GB of RAM the sync (even optimized sync) of the smart proxy ends sometimes with out of memory. I recently added a swap to two smart proxies to mitigate the situation. I thing 32GB should be enough too. But there is probably something going wrong

Which version on which platform? How many running pulpcore workers do you see?

The official system requirements are:

A minimum of 12 GB RAM is required for Smart Proxy server to function. In addition, a minimum of 4 GB RAM of swap space is also recommended. Smart Proxy running with less RAM than the minimum value might not operate correctly.

Last time it happened was on 3.12/4.14 on RHEL9. I see 4 pulpcore-workers.

O.K. I confused the content service workers with the pulpcore workers.

# systemctl status pulpcore-worker@*.service

shows me the 8 configured pulpcore workers.

# systemctl status pulpcore-content.service

shows me the pulpcore-content “app” workers. Those are 17 and that number doesn’t seem to be configurable with foreman-installer at the moment.

But those are the processes which use up all the memory, because each one can use 6 GB or more, which is far to large if you are running 17 worker processes.

So it’s the content worker processes not the pulpcore workers as I wrote initially…

I think when processes are dying you should see the large memory usage with top or ps aux.

@katello Does any of the developers has an idea? I have no way make a complete sync to the proxy, so I just have to hope that the standard sync fills in all gaps that there may be…

The 17 number is an intentional upper limit on number of content workers the installer sets. Not sure why the content workers are eating as much memory during capsule syncs.

@dralley Any pointers? Would reducing the number of workers help?