Oom killer MongoDB at Publish

loitho · December 17, 2019, 1:11pm

Hi there !
Problem:
I’m publishing a content view containing 48 repository (80 000 packages) every day without problem.
However after 6 or 7 days, I encounter a crash due to the system being seemingly out of memory.
Java invokes the oom-killer which kill mongodb when it’s publishing the CV

Doing a “katello-service restart” fixes the issue for a few days (a few publish) then it crashes again.

The system has 20 GB of RAM and 4 GB of SWAP, however, I do see a continuous increase of the Ram and swap usage, up until the swap is full and then the crash when the swap is full and a publish is triggered

Transparent Huge Page are disabled which helped a bit.
Adding 4GB of RAM helped a bit too but the problem came back

Expected outcome:
Having a reliable Publish that doesn’t crash after 7 ou 8 publish

Foreman and Proxy versions:

foreman-tasks	    0.15.5
foreman_ansible		3.0.2
foreman_openscap    1.0.1
foreman_remote_execution		1.8.2
katello		3.12.0

Distribution and version:
CentOS Linux release 7.6.1810 (Core)

Other relevant data:
Do you have any “memory optimization trick” for Foreman ? I looked around, but besides the Transparent Huge Page setting, I didn’t find much.
Would increasing the Swap help ?
I already increased the ram as stated above but it just pushed the problem further.

/var/log/message log :

Dec 17 04:13:48 master-repo kernel: [1269272.647638] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[...]
Dec 17 04:13:48 master-repo kernel: [1269272.647838] Swap cache stats: add 4168599, delete 4089346, find 17293084/17547978
Dec 17 04:13:48 master-repo kernel: [1269272.647839] Free swap  = 0kB
Dec 17 04:13:48 master-repo kernel: [1269272.647840] Total swap = 4194300kB
Dec 17 04:13:49 master-repo kernel: [1269272.648170] Out of memory: Kill process 6648 (mongod) score 312 or sacrifice child
[...]
Dec 17 04:13:49 master-repo kernel: [1269272.648372] Killed process 6648 (mongod) total-vm:10724776kB, anon-rss:6854016kB, file-rss:0kB, shmem-rss:0kB

Thank you

cintrix84 · December 17, 2019, 3:01pm

Hi @loitho

I have pulled this from our Red Hat Website, not sure if you can see this link so I will copy the info just incase but can you try this:

There are three options how to relief the memory pressure:

Usually, there is some another process consuming more than epxected amount of memory. Or a group of processes, when one e.g. sets passenger max pool size too high, causing many Passenger RackApp: /usr/share/foreman processes to pop up. We recommend checking this at the first place.
More memory might be needed for the Satellite server. This depends on the currently allocated RAM and Satellite usage.
One can limit wiredTiger max cache. Be aware that this can negatively affect Satellite performance . Red Hat has not tested impact of this option to performance. Please open a support case before setting this value. Two options of setting this permanently are possible:

By updating `sysconfig` file:

Edit the line in /etc/opt/rh/rh-mongodb34/sysconfig/mongod by adding the --wiredTigerCacheSizeGB 8 . Note that the particular value has not been tested and should be derived from a typical usage of your Satellite, rather :

OPTIONS="-f /etc/opt/rh/rh-mongodb34/mongod.conf  --wiredTigerCacheSizeGB 8"

Reload services and restart mongod service to apply the change:

systemctl daemon-reload
systemctl restart mongod.service

Let me know if that fixes the issue or helps

cintrix84 · December 17, 2019, 3:16pm

@loitho

Before doing the steps I gave. There is a memory leak in dynflow which can eventually lead to some other process (usually mongod, whatever is largest) getting oomkilled.

Next time you start running low on ram can you restart dynflowd and see if it helps, it not then lets do the first steps I mentioned.

loitho · December 17, 2019, 3:17pm

Awesome, thanks you for your reply.
I can indeed see Red Hat KB but that way other people will have the info to, thank you for your paste.

Just for my personnal curiosity, the KB is show as “updated” the 10th of December, is it also its creation date ? Because I’m pretty sure I’ve looked for something a week or too ago and didn’t find anything.

Thank you again, I’ll try this solution and report back.

I’ll check the Dynflow issue too.

cintrix84 · December 17, 2019, 3:38pm

It looks like it was created October 16 and was hidden then went live on the 23rd of October and had some changes here and there until the last one on the 10th of December:

https://s.nimbusweb.me/share/3654634/tcoltbs5bio1c91ap8ix

I am not sure why it was not coming up for you.

cintrix84 · December 17, 2019, 5:45pm

If the dynflow issue is the culprit then you are 95% hitting this:

https://bugzilla.redhat.com/show_bug.cgi?id=1757317

Stefan_Heijmans · December 19, 2019, 4:54pm

Hi, that KB article was updated by my request on the 10th of december, to fix option 3.
As we also had oom killer issues with Satellite.

SlickNetAaron · January 13, 2020, 10:48pm

cintrix84:

By updating sysconfig file:

Edit the line in /etc/opt/rh/rh-mongodb34/sysconfig/mongod by adding the --wiredTigerCacheSizeGB 8 . Note that the particular value has not been tested and should be derived from a typical usage of your Satellite, rather :
OPTIONS="-f /etc/opt/rh/rh-mongodb34/mongod.conf  --wiredTigerCacheSizeGB 8"
Reload services and restart mongod service to apply the change:
systemctl daemon-reload
systemctl restart mongod.service

Thanks for this! I previously tried to set the cache size in mongod.conf, but I think puppet overwrote it. The default for mongod is to use something like 1/2 of system RAM, which is simply bonkers. I wonder if a more sane default when having an all-in-one Foreman install could be set?

I am also hitting the dynflowd memory leak… In my 16GB system, I dropped dynflowd to use 1GB max and then it recycles itself:
in /etc/sysconfig/dynflowd
EXECUTOR_MEMORY_LIMIT=1gb

Oom killer MongoDB at Publish

By updating sysconfig file:

By updating `sysconfig` file: