Problem:
I have set up a new 4.13 on demand content proxy on AlmaLinux 9 and connected it to my current main server (still running on AlmaLinux 8). However, I don’t even get it to a full initial sync. Approx. after 75% of the sync task the content proxy runs out of memory, comes almost to a full stop and the oom killer starts terminating pulpcore-worker processes.
I have 32 GB RAM memory, 8 CPUs. It’s the same as my current “old” content proxy which is running on EL8 for 2 years. It has 8 pulpcore-worker processes.
I can see with top (as long as it’s running) that the pulpcore-worker processes at some point quickly start growing memory allocation (RSS or RES) to over 4 or 5 GB each which exhausts the 32 GB memory available.
My old content proxy uses approx. 16 GB at any time. Of course, it doesn’t need to do a full sync.
As far as I can tell, the worker processes are syncing the grafana repository when the issue occurs, originally from https://rpm.grafana.org/ It contains 3988 rpms.
I currently only have repositories with rpms.
Foreman and Proxy versions:
main server
foreman-3.11.2-1.el8.noarch
katello-4.13.1-1.el8.noarch
on the new proxy
foreman-proxy-content-4.13.1-1.el9.noarch
foreman-proxy-3.11.2-1.el9.noarch
Distribution and version:
AlmaLinux 9.4 on the content proxy.
It’s rpms only. The repository where it definitively got stuck was grafana from https://rpm.grafana.com/ (sorry, posted the wrong url before). It’s has 4.000 rpms with a huge filelist which expands to a 1.5 GB xml with approx. 17 million files linked.
I have reduced the number of workers down to 4 and the sync finally got through. I could see in top that workers would go up to ~ 6 GB in memory.
I think if you sync the grafana repository for the first time to a pulp server you should be able to see it.
@dralley are you surprised at all by the numbers here? 4,000 RPMs doesn’t sound large at all, but the mention of the large filelist might be unique. I’m not sure if that rivals some of the larger RHEL repos like AppStream / BaseOS.
It is surprising yes, although someone reported that the Grafana repo was causing this about 2 weeks ago already, I haven’t fully gotten to the bottom of it yet.
The large filelists is probably not the full story, there’s got to be something else going on that amplifies the issue.
The latest update about the issue sounds like the memory consumption might be expected due to the unusual size of the filelist. However, it’s still being investigated.
Thanks. So far it didn’t happen again. I guess, it only gets really bad during an initial sync of an new content proxy. It then syncs the full repository for all content views and lifecycle environments (4 CVs x 2 LE = 8) in one run, writing the metadata for the first time.
But a later optimized sync or even a manual complete sync of the content proxy doesn’t do anything to the memory usage. It’s not using more then 8 GB in total, with 8 active pulpcore workers.