Recent frequent SIGKILL on Katello pipelines

Katello has been seeing many SIGKILLs stopping the nightly and test pipelines. Example: https://ci.theforeman.org/blue/rest/organizations/jenkins/pipelines/katello-pr-test/runs/13503/log/?start=0

The errors aren’t always occurring, and they don’t seem to happen after any specific test. It seems likely to be an OOM issue.

We on Katello can start by running the tests locally and monitoring the memory usage. I’d like to use this thread to collaborate on the issue.

Question that would be helpful: how much memory do the test machines have? We’ll check on memory consumption, but we won’t know what is a good target.

From @lfu , it appears that the memory usage jumps up when the test/services/katello/ui_notifications/hosts/lifecycle_expire_soon_test.rb test file is run. Without it, only 1.5 GB of memory is used by the tests.

That corresponds with the timeline of this starting to happen as well.

The failing nodes have a base of 16GB of RAM. Probably closer to 12-14 GB available during test runs. Which should be plenty.

2 Likes