Oom-killer invoked by rails job

Problem:
search requests consume large amount of memory and is reproducable

Expected outcome:
use less memory?

Foreman and Proxy versions:
3.4.1 foreman
4.6 katello

Distribution and version:
rhel8

Other relevant data:
when using the search bar in the “All Hosts” Page and start for instance a simple search request, like installed_package_name = htop a rails job is invoked which gradually increases up to ~9gb at 100% cpu load So in case multiple search request gets started oom-killer is invoked. I had this case several times already and was wondering why this search request needs so much memory. Its always observed after any search request. Has someone observed this in a prod environment? On my test-instance the search request is quickly termintated and works as expected but only two test-clients are served by test-instance so not really comparable to prod…

Foreman-prod setup serves ~200 Hosts

root@foreman [foreman-prod] /var/log # strace -f -p 82211
strace: Process 82211 attached with 18 threads
[pid 82318] [ Process PID=82318 runs in x32 mode. ]
[pid 497921] futex(0x7ff2393d3fa4, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 367682] futex(0x55b81142f3c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82339] futex(0x55b811b436c4, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82338] futex(0x55b811b43000, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82337] futex(0x55b811b427a0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82336] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 82335] futex(0x55b811b41c10, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82329] futex(0x55b811ba5aa4, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82306] futex(0x55b811ba0ae0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82296] futex(0x55b8110c2724, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82281] futex(0x55b811b91574, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82280] futex(0x7ff23801ecc0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82231] futex(0x55b80f78f324, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82213] ppoll([{fd=12, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
[pid 82211] ppoll([{fd=4, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
[pid 82295] futex(0x55b80cc87f40, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 82279] ppoll([{fd=10, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
[pid 82336] <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 82336] futex(0x55b7fd65d330, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 82336] futex(0x55b811b420e0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=99976333}) = -1 ETIMEDOUT (Connection timed out)
[pid 82336] futex(0x55b7fd65d330, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 82336] futex(0x55b811b420e0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=99976314}) = -1 ETIMEDOUT (Connection timed out)
[pid 82336] futex(0x55b7fd65d330, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 82336] futex(0x55b811b420e0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=99971750}) = -1 ETIMEDOUT (Connection timed out)
[pid 82336] futex(0x55b7fd65d330, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 82336] futex(0x55b811b420e0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=99971298} <unfinished ...>
[pid 82318] [ Process PID=82318 runs in 64 bit mode. ]
[pid 82318] mprotect(0x7ff0db0cc000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0d4000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0d5000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0dd000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0de000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0e6000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0e7000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0ef000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0f0000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0f8000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db0f9000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db101000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db102000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db10a000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db10b000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db113000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db114000, 32768, PROT_READ|PROT_WRITE) = 0
[pid 82318] mprotect(0x7ff0db11c000, 4096, PROT_READ|PROT_WRITE) = 0

Hi @leziri ,

Are there certain searches that bog down your system, or is it any search? Does even something simple like searching by name fail?

I just tested searching by installed package on an old pre-Pulp 3 Foreman with ~4K hosts and it took under 30 seconds, so this definitely sounds off.

Hi,
I tried 4 different search-requests with some minutes break in between:
installed_package_name = htop
installed_package_name = php-fpm
installed_package_name = httpd

At the beginning of the requests the CPU is going up but is going down after 1-2minutes so this ok. However, the RAM consumption is quite high and its lingering, means after 30min after the request the rails-process which is obviously responsible for the search-request is not cleaned up.

top CPU/MEM
1.


2.

3.

/var/log/messages

Jan 17 13:19:08 foreman kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/foreman.service,task=rails,pid=3970519,uid=992
Jan 17 13:19:08 foreman kernel: Out of memory: Killed process 3970519 (rails) total-vm:10645200kB, anon-rss:10362444kB, file-rss:0kB, shmem-rss:0kB, UID:992 pgtables:20796kB oom_score_adj:0
Jan 17 13:19:09 foreman foreman[3969886]: [3969886] - Worker 1 (PID: 105022) booted in 0.57s, phase: 0

30minutes later process still in the top mem list and consuming mem further

top - 13:49:40 up 18 days, 22:11,  2 users,  load average: 0.06, 0.16, 0.41
Tasks: 410 total,   1 running, 409 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 99.1 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  32146.7 total,   3783.0 free,  23996.3 used,   4367.3 buff/cache
MiB Swap:   1024.0 total,    220.0 free,    803.9 used.   5649.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3970537 foreman   20   0 7940852   7.3g   4288 S   0.3  23.1   9:37.13 rails
3970518 foreman   20   0 4561224   4.1g   3668 S   0.0  13.0   5:54.31 rails
3969969 puppet    20   0 6204760   3.1g   8944 S   0.3   9.7  59:46.86 java
3970525 foreman   20   0 2023928   1.7g      0 S   0.0   5.3   4:35.98 rails
3969858 tomcat    20   0 8183644   1.4g  10636 S   0.3   4.6   6:45.81 java

Does the same thing happen if you do a simple search like name = server.example.com? Or is it just the installed package search that is being slow?

no, in this case no such saturation of CPU/MEM can be observed.

Good to know, it makes a little sense to me since the installed packages query is much more complicated under the hood than a simple name search.

With this info we should have enough to reproduce. I’ve created an issue: Bug #35974: Host search by installed_package_name invokes OOM killer - Katello - Foreman

Did this work better in a previous version of Foreman/Katello? I can see at least 3.16 doesn’t have this issue, but that is very far back.

Thank you for looking into that issue.

Unfortunately I cannot give you a valid answer to your question since the oom-killer notification was only setup recently in kibana and the logs are going back to one month which was when I upgraded to 3.4/4.6. At least I didnt notice any problems before 3.4/4.6 which is however not meaningful…

1 Like