When Katello is synchronising a large Yum repository (Oracle Linux 7 ‘latest’), the server runs out of memory, and the ‘rq’ process is killed by the OOM killer
The sync process works without using as much memory as it does.
Foreman and Proxy versions:
Foreman and Proxy plugin versions:
Distribution and version:
Oracle Linux 8.3
Other relevant data:
The ‘rq’ process which is performing the sync reaches ~9.3GiB of RAM usage, on the 20GiB VM, and the OOM killer kills said process, so that the sync does not complete.
Can we see your Pulp versions?
pip3 list | grep pulp
pulp-certguard (1.1.0) pulp-container (2.2.1) pulp-deb (2.8.0) pulp-file (1.5.0) pulp-rpm (3.9.0) pulpcore (3.9.1)
Can I see some information about your sync from the dynflow console? I’m wondering if we can see at what point specifically it OOM’d. Any information from the expanded action in dynflow would be great. It looks like it was a pulpcore worker that was taking up the memory as ‘rq’.
Also, can I see your
createrepo_c versions please?
rpm -qa | grep createrepo_c
So, it’s dying in action 4 below:
--- pulp_tasks: - pulp_href: "/pulp/api/v3/tasks/b3d232dd-9617-4bbb-b0ca-a22d1e8ef695/" pulp_created: '2021-03-24T22:00:13.867+00:00' state: failed name: pulp_rpm.app.tasks.synchronizing.synchronize logging_cid: da6daf5ef4c04911bc137aa769846b77 started_at: '2021-03-24T22:00:14.019+00:00' finished_at: '2021-03-24T22:01:40.910+00:00' error: traceback: '' description: None worker: "/pulp/api/v3/workers/db5703bf-2b17-4ac9-9c6b-33d2b2d28bc6/" child_tasks:  progress_reports: - message: Downloading Metadata Files code: downloading.metadata state: running done: 5 - message: Downloading Artifacts code: downloading.artifacts state: running done: 0 - message: Associating Content code: associating.content state: running done: 3999 - message: Un-Associating Content code: unassociating.content state: running done: 0 - message: Parsed Comps code: parsing.comps state: completed total: 96 done: 96 - message: Parsed Advisories code: parsing.advisories state: completed total: 4914 done: 4914 - message: Parsed Packages code: parsing.packages state: running total: 22179 done: 0 created_resources: - '' reserved_resources_record: - "/pulp/api/v3/repositories/rpm/rpm/e3989c09-ead9-4b08-8173-356bacb6c306/" - "/pulp/api/v3/remotes/rpm/rpm/08aab7ad-6ca9-48a1-b6b0-8e72d0bdb98b/" create_version: true task_groups:  poll_attempts: total: 26 failed: 1
Also, can I see your
$ rpm -qa | grep createrepo_c python3-createrepo_c-0.17.1-1.el8.x86_64 createrepo_c-libs-0.17.1-1.el8.x86_64
Out of interest I threw extra RAM at the server, and the intial repo sync did complete, ensuring only one Pulp working was working at the time. The worker topped out at around 9.8GiB of RAM usage.
I’m having the same issue, though I’m on Katello 3.18.2 and Foreman 2.3.3 and can’t throw more memory at the issue just yet. Trying to sync RHEL7 base repo and even with 8GB of free memory and no other sync running, it will run out of memory and kill the process as John stated, failing to sync.
20gig is a bit low for hat. I have OL7/OL8 and some custom repos around. My Katello/Foreman VM require around 32gig of RAM to survive for at least a few days, then it goes OOM and services needs to be restarted. Give it more memory and let foreman restart every 3-4 days.
Synchronising a repository shouldn’t take almost 10GiB of RAM, no service should need frequent restarts to keep it from running out of memory due to leaks…
Publishing the CV containing this large repository is also having issues, with the publish pausing part-way through with a proxy error:
[Mon Mar 29 07:53:44.461278 2021] [proxy_http:error] [pid 547921] (20014)Internal error (specific information not available): [client 10.12.34.226:41282] AH01102: error reading status line from remote server httpd-UDS:0 [Mon Mar 29 07:53:44.461370 2021] [proxy:error] [pid 547921] [client 10.12.34.226:41282] AH00898: Error reading from remote server returned by /pulp/api/v3/content/rpm/packages/ [Mon Mar 29 08:18:22.757667 2021] [proxy_http:error] [pid 1301] (20014)Internal error (specific information not available): [client 10.12.34.226:60306] AH01102: error reading status line from remote server httpd-UDS:0 [Mon Mar 29 08:18:22.768418 2021] [proxy:error] [pid 1301] [client 10.12.34.226:60306] AH00898: Error reading from remote server returned by /pulp/api/v3/content/rpm/packages/ [Mon Mar 29 08:22:30.693893 2021] [proxy_http:error] [pid 1297] (20014)Internal error (specific information not available): [client 10.12.34.226:60392] AH01102: error reading status line from remote server httpd-UDS:0 [Mon Mar 29 08:22:30.693964 2021] [proxy:error] [pid 1297] [client 10.12.34.226:60392] AH00898: Error reading from remote server returned by /pulp/api/v3/content/rpm/packages/ 10.12.34.226 - - [29/Mar/2021:07:52:51 +0000] "GET /pulp/api/v3/content/rpm/packages/?arch__ne=src&fields=pulp_href&limit=2000&offset=0&repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Frpm%2Frpm%2Fe3989c09-ead9-4b08-8173-356bacb6c306%2Fversions%2F1%2F HTTP/1.1" 502 341 "-" "OpenAPI-Generator/3.8.0/ruby"
The /pulp/api/v3/content/rpm/packages/ call is timing out at the 600s Apache reverse proxy timeout period, I’m going to try upping that period.
@John_Beranek I tested it myself on a small test system (2 cores, ~15GB RAM,
puppetserver disabled ) and I noticed
rq maxing out at 8.2 GB. I’ve created a Pulp issue (Issue #8467: RQ worker takes up a lot of RAM when syncing OL7 repository - RPM Support - Pulp) so it can get some investigation.