I wrote a script that gracefully terminates Passenger processes
consuming more than 2.5 GB of private RSS memory. Typical consumption
of Foreman with Katello and other basic plugins is around 1.5 GB and
since only Passenger Enterprise allows to limit maximum amount of RSS
memory for processes, this simple script does exactly that:
Put it into your /etc/cron.hourly/ and make it executable, make sure
you read root emails or forward it properly, the script reports all
terminations performed on STDOUT. I would like you to test this script
in production and get back to me with feedback about what you think.
Motivation is simple - we often introduce bugs in our Rails codebase
which performs some eager loading or there are memory leaks in our
code or dependencies and Passenger processes can grow up to dozens
gigabytes. Unfortunately, there is no other way of getting out other
than restarting httpd with passenger. This script could help to avoid
situations when production instance starts to swap hard thank to some
small regression we introduced. Also administrator will be noticed
early via email when this happens so we will keep track of these
regressions in production.
> Hello,
>
> I wrote a script that gracefully terminates Passenger processes
> consuming more than 2.5 GB of private RSS memory. Typical consumption
> of Foreman with Katello and other basic plugins is around 1.5 GB and
> since only Passenger Enterprise allows to limit maximum amount of RSS
> memory for processes, this simple script does exactly that:
>
> https://gist.github.com/lzap/8dddbe66ec8d43cbd4277c1de7045c17
>
> Put it into your /etc/cron.hourly/ and make it executable, make sure
> you read root emails or forward it properly, the script reports all
> terminations performed on STDOUT. I would like you to test this script
> in production and get back to me with feedback about what you think.
>
> Motivation is simple - we often introduce bugs in our Rails codebase
> which performs some eager loading or there are memory leaks in our
> code or dependencies and Passenger processes can grow up to dozens
> gigabytes. Unfortunately, there is no other way of getting out other
> than restarting httpd with passenger. This script could help to avoid
> situations when production instance starts to swap hard thank to some
> small regression we introduced. Also administrator will be noticed
> early via email when this happens so we will keep track of these
> regressions in production.
>
> Feature #19496: Passenger graceful killer cron job - Foreman Maintain - Foreman
>
> Feedback appreciated.
>
>
I gave it a spin, two comments:
my default memory size was around 400m i had to change the script it in
order to see it in action, maybe we should take into account avail memory
and how many passenger processes are allowed - the default of 2.5gb seems a
bit too high?
in order to use your script in an scl env, you would need to wrap it
with scl enable <collection> passenger-recycler
Thanks!
Ohad
···
On Wed, May 10, 2017 at 10:28 AM, Lukas Zapletal wrote:
While this is an interesting idea, I have a few concerns about it:
This won't take care of requests that balloon memory very rapidly (such
as the issue we recently found in katello which would cause OOM in a matter
of minutes). Waiting up to an hour to reclaim memory might not be fast
enough in those cases.
If passenger fails to complete a response within the graceful shutdown
period, users will just have a request failed and no indication of what
happened. While most requests should finish within 2 minutes, for example a
large csv download can take much longer then that, and user will not know
why it failed (and it could be completely unrelated, leading to red
herrings - the request that failed won't necessarily be the one that caused
the leak).
We might not know about places where we leak memory, since the process
will get killed without anyone noticing it.
···
On Wed, May 10, 2017 at 10:43 AM, Ohad Levy wrote:
On Wed, May 10, 2017 at 10:28 AM, Lukas Zapletal lzap@redhat.com wrote:
Hello,
I wrote a script that gracefully terminates Passenger processes
consuming more than 2.5 GB of private RSS memory. Typical consumption
of Foreman with Katello and other basic plugins is around 1.5 GB and
since only Passenger Enterprise allows to limit maximum amount of RSS
memory for processes, this simple script does exactly that:
Put it into your /etc/cron.hourly/ and make it executable, make sure
you read root emails or forward it properly, the script reports all
terminations performed on STDOUT. I would like you to test this script
in production and get back to me with feedback about what you think.
Motivation is simple - we often introduce bugs in our Rails codebase
which performs some eager loading or there are memory leaks in our
code or dependencies and Passenger processes can grow up to dozens
gigabytes. Unfortunately, there is no other way of getting out other
than restarting httpd with passenger. This script could help to avoid
situations when production instance starts to swap hard thank to some
small regression we introduced. Also administrator will be noticed
early via email when this happens so we will keep track of these
regressions in production.
my default memory size was around 400m i had to change the script it in
order to see it in action, maybe we should take into account avail memory
and how many passenger processes are allowed - the default of 2.5gb seems a
bit too high?
in order to use your script in an scl env, you would need to wrap it
with scl enable passenger-recycler
Ohad 1) I decreased it to 2 GB by default, but I don't want to go
further down in case of many plugins installed.
Ohad 2) I tested it and it works just fine without scl, the non-scl
passenger client can communicate with scl passenger as we install both
passengers (one for puppet, one for foreman). I wrapped it with SCL
tho, no problem. Ah wait, SCL shebang is not supported, I can create
foreman-ruby wrapper for RHEL7 in the PR, it's Fedora only: https://bugzilla.redhat.com/show_bug.cgi?id=1058796
Tomer 1) It is possible to run this every minute, the script is simple
and fast, but my goal is not to replace process frameworks like GodRB,
I want something that notifies administrator something is wrong. I
could also send a foreman UI notification if we have an API for that.
If processes grows in minutes, they will be likely above threashold
after an hour as well. And since it is very easy to signal termination
and Passenger handles that very well (terminates the process and
immediately starts respawning new one if there are not enough
workers), why not to do it.
Tomer 2) Well, in context of processes consuming 2 GB RSS memory which
is essentially 24 GB RSS private memory (we default 12 passenger
processes), user not seing response is not that relevant I think. If
the server is powerful enough to terminate process gracefully (we have
2 minutes grace period), it will finish processing eventually. But if
this is your concern, I added a configuration option to disable this
feature, we can ship with disabled killing by default if you want to.
Tomer 3) I added output of passenger-status which gives nice overview
including stacktraces of all workers before termination is performed,
but if I compare current state (we don't care) with what I propose
(administrator gets emailed and excessive process gets terminated
nicely) it's like night and day. We know more than we did before.
Check out the new version I just pushed into the gist! I hope you will like it.
···
On Wed, May 10, 2017 at 9:55 AM, Tomer Brisker wrote:
> While this is an interesting idea, I have a few concerns about it:
> 1. This won't take care of requests that balloon memory very rapidly (such
> as the issue we recently found in katello which would cause OOM in a matter
> of minutes). Waiting up to an hour to reclaim memory might not be fast
> enough in those cases.
> 2. If passenger fails to complete a response within the graceful shutdown
> period, users will just have a request failed and no indication of what
> happened. While most requests should finish within 2 minutes, for example a
> large csv download can take much longer then that, and user will not know
> why it failed (and it could be completely unrelated, leading to red herrings
> - the request that failed won't necessarily be the one that caused the
> leak).
> 3. We might not know about places where we leak memory, since the process
> will get killed without anyone noticing it.
>
> On Wed, May 10, 2017 at 10:43 AM, Ohad Levy wrote:
>>
>>
>>
>> On Wed, May 10, 2017 at 10:28 AM, Lukas Zapletal wrote:
>>>
>>> Hello,
>>>
>>> I wrote a script that gracefully terminates Passenger processes
>>> consuming more than 2.5 GB of private RSS memory. Typical consumption
>>> of Foreman with Katello and other basic plugins is around 1.5 GB and
>>> since only Passenger Enterprise allows to limit maximum amount of RSS
>>> memory for processes, this simple script does exactly that:
>>>
>>> https://gist.github.com/lzap/8dddbe66ec8d43cbd4277c1de7045c17
>>>
>>> Put it into your /etc/cron.hourly/ and make it executable, make sure
>>> you read root emails or forward it properly, the script reports all
>>> terminations performed on STDOUT. I would like you to test this script
>>> in production and get back to me with feedback about what you think.
>>>
>>> Motivation is simple - we often introduce bugs in our Rails codebase
>>> which performs some eager loading or there are memory leaks in our
>>> code or dependencies and Passenger processes can grow up to dozens
>>> gigabytes. Unfortunately, there is no other way of getting out other
>>> than restarting httpd with passenger. This script could help to avoid
>>> situations when production instance starts to swap hard thank to some
>>> small regression we introduced. Also administrator will be noticed
>>> early via email when this happens so we will keep track of these
>>> regressions in production.
>>>
>>> http://projects.theforeman.org/issues/19496
>>>
>>> Feedback appreciated.
>>>
>>
>> I gave it a spin, two comments:
>>
>> 1. my default memory size was around 400m i had to change the script it in
>> order to see it in action, maybe we should take into account avail memory
>> and how many passenger processes are allowed - the default of 2.5gb seems a
>> bit too high?
>> 2. in order to use your script in an scl env, you would need to wrap it
>> with scl enable passenger-recycler
>>
>> Thanks!
>> Ohad
>>
>>>
>>> --
>>> Later,
>>> Lukas @lzap Zapletal
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "foreman-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to foreman-dev+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "foreman-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to foreman-dev+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Have a nice day,
> Tomer Brisker
> Red Hat Engineering
>
> --
> You received this message because you are subscribed to the Google Groups
> "foreman-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
It's just a script as cron job, if we agree that hourly is viable
option (that's what I think).
I think I could also backport it into 1.15 as "opt-in" (e.g. disabled
by default) so people can easily test it.
···
On Wed, May 10, 2017 at 1:58 PM, Bryan Kearney wrote:
> On 05/10/2017 05:05 AM, Lukas Zapletal wrote:
>>
>> Check out the new version I just pushed into the gist! I hope you will
>> like it.
>
> What is the process to go from gist to in 1.16?
>
> -- bk
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "foreman-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
We keep having these strange out of memory messages, and we expecting a memory leak somewhere. I just found your script. We have rubygem-foreman_maintain.noarch 1:0.7.1-1.el7 installed but when executing it fails with:
[EL7/INT] [root@xxxxxx cron.d]$] /usr/bin/scl enable tfm -- ruby /usr/bin/passenger-recycler
Traceback (most recent call last):
3: from /usr/bin/passenger-recycler:22:in `<main>'
2: from /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/core_ext/kernel_gem.rb:65:in `gem'
1: from /opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/dependency.rb:322:in `to_spec'
/opt/rh/rh-ruby25/root/usr/share/rubygems/rubygems/dependency.rb:310:in `to_specs': Could not find 'foreman_maintain' (>= 0) among 165 total gem(s) (Gem::MissingSpecError)
Checked in 'GEM_PATH=/opt/theforeman/tfm/root/usr/share/gems:/root/.gem/ruby:/opt/rh/rh-ruby25/root/usr/share/gems:/opt/rh/rh-ruby25/root/usr/local/share/gems', execute `gem env` for more information
@Gerwin_Krist you should not enable the SCL Ruby. foreman-maintain uses the system Ruby. If it would be switched to the SCL Ruby, the shebang would be modified.
Sorry guys! Ruby is really not my thing and I really don’t know why I couldn’t make that up
But still getting this one:
/usr/share/gems/gems/foreman_maintain-0.7.1/bin/passenger-recycler:10: warning: already initialized constant CONFIG
/usr/share/gems/gems/foreman_maintain-0.7.1/bin/passenger-recycler:8: warning: previous definition of CONFIG was here
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- phusion_passenger (LoadError)
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /usr/share/gems/gems/foreman_maintain-0.7.1/bin/passenger-recycler:46:in `<top (required)>'
from /usr/bin/passenger-recycler:23:in `load'
from /usr/bin/passenger-recycler:23:in `<main>'