Seems to be something really weird at your configuration. We’ve connected around 2000 Systems to our production system (4x 4CPUs+ 10G Ram), but no plugins added yet
I think it’s apples and oranges here. @chr1s is running pure Foreman (from sources) without Katello (perhaps even Puppet on a different node?) while @sjansen appears to run Katello-MongoDB-Candlepin-Puppet full stack.
Anyway, that patch did help for CPU a little bit. I guess you don’t see any memory improvements.
Any news to share?
One sidenote, @ekohl have realized that Puppet is calling ENC twice for some reason. If we find why is that and fix this, 1/2 of your queries would disappear. This won’t help with memory hog much, possibly makes it slower but it is worth noting.
Yes, thank you for reminding me. It looks like I haven’t created a ticket for the double ENC call. That’s entirely done by Puppetserver because even with a trivial ENC script, it’s called twice.
Sorry for the late reply. I forgot we had a holiday on Monday
Last week I tried different configurations of puma (worker/threads/…) It seems that the following configurations works fine for me (after the fix and i don’t know what it would look like without it):
Worker 4, threads 4,8 @ruby2.5
Foreman with ruby2.7
Since the configuration now works fine and I’m still getting a lot of warning with ruby2.7, i dont know, if i should bring this packages in our production. What is your experience using ruby2.7?
At this moment we don’t have any supported platform on Ruby 2.7. @mmoll is working on Ubuntu 20.04 support (which ships 2.7) but I don’t know if specific 2.7 fixes were needed.
To me it looks very similar to what you had with Ruby 2.5, 4 workers hitting almost 8 gigs which is 2 GB per worker. We have seen our process to peak at 2 GB and that should be probably the treshold when we should set up automatic worker restart with a tool like: https://github.com/schneems/puma_worker_killer
The only difference is that we have seen this with Foreman + many plugins installed. You say you have just Foreman core. That’s weird. Maybe ton of ENC data? Dunno.
Remember the first post i’ve created. The puma configuration was workers: 2, threads 0,16 (according to your defaults from the puppet modul) and the memory consumption:
Perhaps threads cosume more memory or is this a problem of ruby2.5 memory fragmentation?!
Ok thanks for comparsion, that’s like 20% reduction. Not bad.
Yeah it really looks like the Ruby 2.7 memory de-fragmentation really helps here. But still, even if there was no Copy on Write what we are seing here is 4 GB per one worker. That’s way too much. There must be a memory leak somewhere.
Does dtrace, systemtap or the modern bpftrace work on your distro?
Can you share how many CPUs/cores do you have? It is worth looking into
MALLOC_ARENA_MAX setting, currently you probably use the default value which is often a bit too high making the best multithreaded performance at the cost of some memory. Try to decrease this to just 2. For more info: https://developers.redhat.com/blog/2017/03/02/malloc-internals-and-you/
Not that we misunderstand each other, the differences only come from changing the puma settings from workers:2, threads:0,16 to workers:4, threads:4,8 + your externalnode fix. We’re still using ruby2.5
Systemtap is available on SLES12SP5, but i dodnt know how to use it…I’ve never done anything like this before. If you tell me exactly what you want me to do, we can try this.
We’re using virtual machines (VMWare), each virtual machine has 4 virtual CPUs (system hardware based on Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz).
You can try to play around with https://github.com/lzap/foreman-tracer - a set of SystemTap scripts that can help tracking issues. For example
foreman-tracer rails objects-total should be particularly interesting. You need to have Ruby compiled with STAP/DTRACE statements, which is by default for modern Ruby versions. But if it does not work for you, don’t loose much time for this.
I suggest trying setting this in
We actually do this for dynflow processes and I wonder why we don’t do the same since we default up to 16 threads now. @ekohl do you think this is worth adding into 2.1?
I think that makes sense. Should be as easy as changing https://github.com/theforeman/foreman/blob/develop/extras/systemd/foreman.service. That would have been more complicated with Passenger.
I tried to run foreman-tracer, but some binaries were missing. Scl was the last, before i decided to test MALLOC_ARENA_MAX=2 first.
“foreman.service” is corrected and now we need to wait
Here you can see the graphs after almost one day. (Puma config: worker=4, threads=4,8)
Node1 with MALLOC_ARENA_MAX=2
Node2 without MALLOC_ARENA_MAX
Thanks! looks like quite a noticeable reduction in memory usage. Have you seen any impact on cpu load?
I cant see any impact on cpu or load
Nice, that’s 1 GB saved on a standard deployment.
I do believe that with the default value (2 processes, 0-16 threads) this would be even more significant. Anyway, we are likely going to merge this change.