Content Proxy out of memory

gvde · September 23, 2024, 6:23pm

Problem:
I have set up a new 4.13 on demand content proxy on AlmaLinux 9 and connected it to my current main server (still running on AlmaLinux 8). However, I don’t even get it to a full initial sync. Approx. after 75% of the sync task the content proxy runs out of memory, comes almost to a full stop and the oom killer starts terminating pulpcore-worker processes.

I have 32 GB RAM memory, 8 CPUs. It’s the same as my current “old” content proxy which is running on EL8 for 2 years. It has 8 pulpcore-worker processes.

I can see with top (as long as it’s running) that the pulpcore-worker processes at some point quickly start growing memory allocation (RSS or RES) to over 4 or 5 GB each which exhausts the 32 GB memory available.

My old content proxy uses approx. 16 GB at any time. Of course, it doesn’t need to do a full sync.

As far as I can tell, the worker processes are syncing the grafana repository when the issue occurs, originally from https://rpm.grafana.org/ It contains 3988 rpms.

I currently only have repositories with rpms.

Foreman and Proxy versions:
main server
foreman-3.11.2-1.el8.noarch
katello-4.13.1-1.el8.noarch

on the new proxy
foreman-proxy-content-4.13.1-1.el9.noarch
foreman-proxy-3.11.2-1.el9.noarch

Distribution and version:
AlmaLinux 9.4 on the content proxy.

iballou · September 23, 2024, 8:13pm

Hi @gvde ,

Are you only syncing RPM content? What’s the order of magnitude of content being synced (repositories and content units)?

gvde · September 24, 2024, 5:02am

It’s rpms only. The repository where it definitively got stuck was grafana from https://rpm.grafana.com/ (sorry, posted the wrong url before). It’s has 4.000 rpms with a huge filelist which expands to a 1.5 GB xml with approx. 17 million files linked.

I have reduced the number of workers down to 4 and the sync finally got through. I could see in top that workers would go up to ~ 6 GB in memory.

I think if you sync the grafana repository for the first time to a pulp server you should be able to see it.

iballou · September 30, 2024, 6:50pm

@dralley are you surprised at all by the numbers here? 4,000 RPMs doesn’t sound large at all, but the mention of the large filelist might be unique. I’m not sure if that rivals some of the larger RHEL repos like AppStream / BaseOS.

dralley · September 30, 2024, 7:09pm

It is surprising yes, although someone reported that the Grafana repo was causing this about 2 weeks ago already, I haven’t fully gotten to the bottom of it yet.

The large filelists is probably not the full story, there’s got to be something else going on that amplifies the issue.

iballou · October 2, 2024, 4:07pm

Thank @dralley.

@gvde I didn’t find an upstream tracker for this issue yet, so for now you can use the Jira: https://issues.redhat.com/browse/SAT-28151

The latest update about the issue sounds like the memory consumption might be expected due to the unusual size of the filelist. However, it’s still being investigated.

gvde · October 2, 2024, 4:52pm

Thanks. So far it didn’t happen again. I guess, it only gets really bad during an initial sync of an new content proxy. It then syncs the full repository for all content views and lifecycle environments (4 CVs x 2 LE = 8) in one run, writing the metadata for the first time.

But a later optimized sync or even a manual complete sync of the content proxy doesn’t do anything to the memory usage. It’s not using more then 8 GB in total, with 8 active pulpcore workers.

dralley · April 30, 2025, 4:40pm

Grafana seems like they may have fixed their repo, filelists.xml is now 4x smaller than it was previously.