MongoDB crashing

stiflermt · July 24, 2019, 1:16pm

Problem:
Hi I would like some help, we are encountering random rh-mongodb34-mongod service crashes (mostly during very heavy operations such as calculating errata) unfortunately bit lost on where to start diagnosis the issue as SystemD just indicates that it received a kill 9 signal

Expected outcome:
MongoDB is no longer unstable

Foreman and Proxy versions:
1.22.0
Foreman and Proxy plugin versions:
Pulp 1.4.1
Pulp server version 2.19.1

Other relevant data:
katello-service status indicates that the below-mentioned service also failed when Mango went down, most likely since the port was not listening as indicated with the connection refused.

pulp_celerybeat.service - Pulp’s Celerybeat
Loaded: loaded (/usr/lib/systemd/system/pulp_celerybeat.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-07-24 12:50:37 UTC; 22min ago
Process: 10605 ExecStart=/usr/bin/celery beat --app=pulp.server.async.celery_instance.celery --scheduler=pulp.server.async.scheduler.Scheduler (code=exited, status=1/FAILURE)
Main PID: 10605 (code=exited, status=1/FAILURE)

Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: File “/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py”, line 712, in _get_socket
Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: server = self._get_topology().select_server(selector)
Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: File “/usr/lib64/python2.7/site-packages/pymongo/topology.py”, line 141, in select_server
Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: address))
Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: File “/usr/lib64/python2.7/site-packages/pymongo/topology.py”, line 117, in select_servers
Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: self._error_message(selector))
Jul 24 12:50:36 SERVERNAME.REMOVED celery[10605]: pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused
Jul 24 12:50:37 SERVERNAME.REMOVED systemd[1]: pulp_celerybeat.service: main process exited, code=exited, status=1/FAILURE
Jul 24 12:50:37 SERVERNAME.REMOVED systemd[1]: Unit pulp_celerybeat.service entered failed state.
Jul 24 12:50:37 SERVERNAME.REMOVED systemd[1]: pulp_celerybeat.service failed.

logs

Justin_Sherrill · July 24, 2019, 1:22pm

Usually when mongo goes down its memory related (at least in my experience), do you see any 'oom’s in the journal logs? Can you grab the log messages around when mongo died, shortly before pulp started complaining about.

stiflermt · July 24, 2019, 1:37pm

mmm you seem to be onto something here will add more RAM to this smart proxy

Dirk · July 25, 2019, 6:36am

I experienced the same with a new demo setup, mongodb still crashes after adding 12 GB of RAM, while my older demo works fine with 6 GB. Perhaps something to look for in the changelog?

Justin_Sherrill · July 25, 2019, 1:17pm

Were they syncing the same amount of content? I’ve noticed that the
more content that is synced the more memory it needs.

Justin

Justin_Sherrill · July 25, 2019, 1:48pm

We’ve also now switched to a newer mongo version via the scl, its
possible that is requiring more memory. Can’t wait for pulp3 and
moving to 100% postgresql

Justin

indygwyn · July 31, 2019, 6:50pm

I’ve seen the exact same thing on my servers, mongod just dying. I finally gave up and dropped in a Restart=on-failure and RestartSec=60 into Systemd Unit