Foreman 2.0.0 - memory leak?

chr1s · May 27, 2020, 3:30pm

Hi everyboy,
we’ve build Foreman 2.0.0 from source and run it with puma + apache reverse proxy on SLES12 SP5 (ruby2.5 and nodejs10/npm10).
All our servers have 4cpus and 10G memory. Puma is configured to use 2 Workers with 0,16 threads. “preload_app!” is actived.
A typical memory consumption looks like: foreman_memory_consumption

Is this a normal behaviour?
If you also need further data, please let me know.

best regards

ekohl · May 27, 2020, 4:08pm

I started to look at the same thing but I haven’t got around to graphing it.

Can you take a look at which services are taking up the memory? You can use systemd for this by setting DefaultMemoryAccounting=yes in /etc/systemd/system.conf. Note that if it was false, a service needs to be restarted before it starts to measure it.

Then you can get output like:

# systemctl show --property=Names,MemoryCurrent httpd postgresql foreman* dynflow*
MemoryCurrent=11235328
Names=httpd.service

MemoryCurrent=209670144
Names=postgresql.service

MemoryCurrent=253784064
Names=dynflow-sidekiq@orchestrator.service

MemoryCurrent=431247360
Names=foreman.service

MemoryCurrent=123834368
Names=foreman-proxy.service

MemoryCurrent=252919808
Names=dynflow-sidekiq@worker.service

chr1s · May 28, 2020, 5:17am

Ok, I had to adjust the parameter DefaultMemoryAccounting to yes and restart the services. We have to wait i little bit :/.

Since we build foreman from source, we did not use dynflow-* yet. postgres and the foreman-proxy installed on different server. Foreman and apache are the only services run on this server.

chr1s · May 28, 2020, 9:18am

Memory consumption so far, still growing slowly:

Systemd output:

systemctl show --property=Names,MemoryCurrent httpd foreman* dynflow*
MemoryCurrent=18722816
Names=httpd.service apache2.service

MemoryCurrent=5175435264
Names=foreman.service

ekohl · May 28, 2020, 9:30am

Can you also share which plugins you have installed, if any?

@tbrisker @lzap any insights?

chr1s · May 28, 2020, 10:08am

Currently we have not yet installed any plugins.

lzap · May 28, 2020, 12:25pm

Hey,

can you pull the same graph but per service? I am particularly interested in foreman (puma) and dynflow (sidekiq).

Compared to what we had previously (passenger), there is one big difference in your setup. We had a single thread per passenger process, by default our installer configured 2-N number of concurrent passenger workers depending on number of cores IIRC. That might be as low as 4-8 in your case.

What I see now is that you have 2 processes with 32 threads in total. A thread is lightwight compared to a process, however there is some amount of memory resources that are still significant - stack frame for example and thread locals.

I’d not expect it to be that big tho. Therefore, once you get those graphs I’d be interested in correlating those with a setup where you set 2 workers 8 threads per worker just to compare this with the current configuration.

Also, let’s focus on dynflow as well - these are now three processes and I think they can also contribute to the total memory consumption. Graph those as well.

ekohl · May 28, 2020, 12:34pm

He stated before that there is no dynflow:

You can also see that it’s just foreman.service taking up 5 GB of memory:

lzap · May 28, 2020, 12:41pm

Then try configurations with 1 worker and 1, 8, 16 threads to correlate how it’s groing. IMHO 16 is too many for Ruby with GIL, I haven’t tested but I expect the sweet spot somewhere between 3 - 10 threads.

lzap · May 28, 2020, 12:46pm

Just to be sure, I am curious and I’d like to rule out this new setup.

It might be a memory leak introduced in 2.0.0 too.

ekohl · May 28, 2020, 12:49pm

This is the default:

github.com

theforeman/foreman/blob/f6c80e05be1d61f622a97c3646695abfe879bbf0/config/puma/production.rb#L1-L6


      
          # Configure "min" to be the minimum number of threads to use to answer
          # requests and "max" the maximum.
          #
          # The default is "0, 16".
          #
          threads ENV.fetch('FOREMAN_PUMA_THREADS_MIN', 0).to_i, ENV.fetch('FOREMAN_PUMA_THREADS_MAX', 16).to_i

That in turn is copied from Puma’s defaults: GitHub - puma/puma: A Ruby/Rack web server built for parallelism

lzap · May 28, 2020, 12:52pm

I’d be curious anyway.

What I usually use is SystemTap or dtrace. We have some examples for the former: https://github.com/lzap/foreman-tracer

Or you can use any Ruby memory analyzer to see object allocations, advantage of the tracing is that it’s quite effective and does not slow down your deployment too much.

chr1s · May 28, 2020, 1:00pm

ok, i started to use the default configuration from the puppet modul for foreman (workers=0, threads_min=0, threads_max=16)

in parallel i try different puma configurations (workers, threads_min, threads_max). it seems that these have an effect on memory consumption.
my experience so far:
Nodes 1:
1.1 used the defaults, 1.2 worker:2, threads_min: 8, threads_max:16

Node 2:
2.1 used the defaults, 2.2 worker: 4, threads_min: 4, threads_max: 8

Node 3:
3.1 used the defaults, 3.2 worker: 4, threads_min: 8, threads_max: 16

all have in common that the threads_min is configured to != 0. i will keep watching the different setups.

lzap · May 28, 2020, 3:08pm

The differences are insignificant.

Before we dig further, can you double check the list of plugins you have enabled?

Also can you run this simple tool to analyze your rails log and share which controllers and actions are the slowest and mostly used? GitHub - pmoravec/rails-load-stats: A script parsing a logfile of a Ruby on Rails app to analyze where the load to the app comes from.

chr1s · May 29, 2020, 5:33am

No, we really have no plugins activated or installed.

Yes, i can do that. Our logging type was set to syslog + json, which results in many script errors. I changed the settings and now have to wait until there are some more log entries.

lzap · May 29, 2020, 8:43am

Uh, that’s surprisingly high memory usage then! Let’s wait what do you hit the most.

lzap · May 29, 2020, 8:44am

Oh there is SystemTap in SUSE, you could use our examples to track allocations.

chr1s · May 29, 2020, 8:52am

sooo, here are some results so far from my “production-test-node”:
current configuration of puma: workers: 4, threas 8,8

results from the analyze.sh:
there were 15107 requests taking 21401325 ms (i.e. 5.94 hours, i.e. 0.25 days) in summary

type                                            count   min     max     avg     mean    sum             percentage
--------------------------------------------------------------------------------------------------------------------
ConfigReportsController#create                  3213    58      1165    90      71      291307          1.36 %
HostsController#facts                           3309    23      17043   1811    1555    5995471         28.01 %
HostsController#index                           9       51194   105400  72737   64195   654640          3.06 %
PingController#ping                             1237    7       273     13      10      17166           0.08 %
PuppetclassesController#index                   450     173     432     210     188     94622           0.44 %
SmartClassParametersController#index            8       183     332     215     195     1727            0.01 %
SmartClassParametersController#show             9       188     641     298     211     2682            0.01 %
AuditsController#index                          1       2109    2109    2109    2109    2109            0.01 %
ConfigReportsController#index                   3       88      93      90      89      270             0.00 %
ConfigReportsController#show                    1       63      63      63      63      63              0.00 %
DashboardController#index                       6       40      126     60      46      362             0.00 %
DashboardController#show                        17      54      6088    504     88      8580            0.04 %
FactValuesController#index                      2       104     1425    764     104     1529            0.01 %
HostgroupsController#auto_complete_search       4       18      32      21      18      86              0.00 %
HostgroupsController#edit                       2       2923    3225    3074    2923    6148            0.03 %
HostgroupsController#index                      6       180     1040    517     193     3106            0.01 %
HostgroupsController#update                     3       307     406     350     339     1052            0.00 %
HostsController#auto_complete_search            13      10      28      12      11      166             0.00 %
HostsController#edit                            3       1996    2630    2258    2149    6775            0.03 %
HostsController#externalNodes                   6416    1172    28292   2227    1713    14292067        66.78 %
HostsController#index                           3       202     274     248     270     746             0.00 %
HostsController#multiple_puppetrun              1       94      94      94      94      94              0.00 %
HostsController#nics                            5       31      108     65      66      327             0.00 %
HostsController#overview                        2       54      96      75      54      150             0.00 %
HostsController#resources                       2       73      100     86      73      173             0.00 %
HostsController#runtime                         2       632     674     653     632     1306            0.01 %
HostsController#show                            2       307     533     420     307     840             0.00 %
HostsController#templates                       5       154     255     197     167     986             0.00 %
HostsController#update                          3       427     774     577     531     1732            0.01 %
NotificationRecipientsController#index          359     7       129     16      14      5969            0.03 %
PuppetclassesController#auto_complete_search    1       10      10      10      10      10              0.00 %
PuppetclassesController#index                   1       229     229     229     229     229             0.00 %
PuppetclassesController#update                  1       331     331     331     331     331             0.00 %
SubnetsController#index                         2       100     120     110     100     220             0.00 %
UsersController#login                           6       7       4140    1380    25      8284            0.04 %

concurrent requests:
- MAX: 14 when processing request with ID 3b7da576
- AVG: 2
- MEAN: 2
- 90%PERCENTILE: 4

lzap · May 29, 2020, 9:29am

First, performance. We have heard that 1.24/2.0 do not perform ENC well, that’s 2/3 of your time spent. I’ve made a patch for 2.1 which is relatively slow and shows good improvements there on my instance, your milage may very depending on what do you have in your database, but try to apply this if this helps:

github.com/theforeman/foreman

Fixes #29582 - Improve performance of externalNodes

theforeman:develop ← lzap:externalnode-perf-29582

opened 09:56AM - 21 Apr 20 UTC

lzap

+51 -5

This patch speeds up externalNodes action by 13 times: from 14 ops/s to 190 ops/…s. You can test this on your setup, benchmark is included - it tests ENC with a host with 1000 facts and 1000 host parameters with a simple ERB block. To run it (I used Foreman develop with PostgreSQL from Fedora 31): ``` RAILS_ENV=production be ruby -Itest test/benchmark/host_info_benchmark.rb ``` Here is how I was researching the code and implementing changes and the gain I got from the change. * Initial performance: 14 ops/second * Removed deep_stringify_keys call: 14 ops/second * Hash is passed as argument (less top level copying) 22 ops/second * Skip ERB rendering when string does not contain `<%` 28 ops/second * Memorized renderer scope 28 ops/second * Moved interpolate_erb_in_parameters settings check out of the loop: 184 ops/second * Introducing inplace parse!: 189 ops/second So the biggest speedup is by simply moving this line out of the private method which is called many times to the entry method: ``` return value unless (Setting[:interpolate_erb_in_parameters]) ``` I do believe that our settings stack is ultra slow and if we refactor it in a way that a setting lookup is very fast it would boost other parts of Foreman too.

Now, to the memory. Our experience is that a worker process with some plugins installed including Katello can max out at 2 GB on heavy deployments. This is where you should target a auto restart. This brings a question, @ekohl do we have an automatic worker restart bound to some memory in the new puma deployment? This was one of the passenger limitation, this feature was only available in the paid version of the product. We should definitely deploy something like this now that we have a report that memory can grow that fast.

Next steps are finding what are those leaks, but before you do that, do you have any reason why not to use Ruby 2.7? That version has a new feature - GC compacting or defragmenting. Ruby VM is known for fragmenting memory creating “gaps” on the heap which are never returned to the OS. The latest stable version should be better. Try that the codebase should be 2.7 compatible if I am not mistaken. Compare the graph with what you have here (4+8-8).

chr1s · May 29, 2020, 9:42am

Ok, we can try this patch but this may some time to apply.

Next steps are finding what are those leaks, but before you do that, do you have any reason why not to use Ruby 2.7?

Because our company using SLES …all jokes aside, i followed your “Install from Source” instruction and it says “Ruby 2.5 or newer”. So i started with ruby2.5 I can try to build Foreman with ruby2.7, but this will take some time and its not possible to do that on our production system.