Problem:
Performance Tuning Foreman in a home lab environment options, different approach from production approach to performance tuning
Expected outcome:
optimised throughput and resource utilisation for small lab deployments
Foreman and Proxy versions:
Foreman 3.7
Foreman-Proxy 3.7
Foreman and Proxy plugin versions:
Ansible 3.5.5
DHCP 3.7.0
DNS 3.7.0
Dynflow 0.9.0
Openscap 0.9.2 (not in use)
Script 0.10.1
TFTP 3.7.0
libvirt 3.7.0
puppet-6.0.0
Distribution and version:
CentOS 8-Stream x86_64
Puppet 7 (7.12)
Other relevant data:
I’m running Foreman 3.7 with Puppet 7 in a home lab environment to test some automation tasks. I’m using libvirt plugin to provision virtual machines on a kvm/libvirt target I’m running between 15-40 guests at one time (never more) on the KVM host, with a puppet run on the guests every 5 minutes (for quick changes).
The foreman server is a Intel NUC with an intel i5 5200 with 4 cores, 16GB ram and an NVME disk so low spec, but has always been a solid home lab device for doing foreman development work / learning.
The Foreman tuning guides Tuning performance of Foreman and RHEL Satellite tuning guides are fantastic for production / enterprise deployments and provider some great improvements and optimisations however, I think there are differing options to tune a home lab type setup rather than just changing these numbers from large scale.
I’ve also noticed when running 30-40 guests at once with a 5 minute check in time, occasionally the guests go out of sync with the puppet master, I’ve seen in the logs
Aug 4 09:46:10 ezri puppet-agent[48056]: Connection to https://lab.no-dns.co.uk:8140/puppet/v3 failed, trying next route: Request to https://lab.no-dns.co.uk:8140/puppet/v3 failed after 0.004 seconds: Failed to open TCP connection to lab.no-dns.co.uk:8140 (Connection refused - connect(2) for “jarvis.no-dns.co.uk” port 8140)
Aug 4 09:46:10 ezri puppet-agent[48056]: Wrapped exception:
Aug 4 09:46:10 ezri puppet-agent[48056]: Failed to open TCP connection to lab.no-dns.co.uk:8140 (Connection refused - connect(2) for “lab.no-dns.co.uk” port 8140)
the checkin’s are fine as 90% of the time all hosts are in sync, and if I manually run puppet on the guest it checks in and completes fine, I believe the foreman server can’t process the volume of simultaneous requests for 30-40 hosts overlapping every 5 minutes
I’m trying to look at the best ways to optimise my home lab (beyond putting it onto bigger tin which I don’t think is really needed to fix this)
I’d like to look at optimisation opinions for
a.) web gui snappy response, it’s pretty good as is, and there is a little lag caused by the libvirt plugin that has to check in with the libvirt host to pull back the data, but optimising it for human interaction would be good.
b.) web services capacity / performance for the proxies, especially around the smart proxy interface to the puppet master, I believe this is what’s causing some puppet runs to fail
c.) the actual puppet master - foreman installed and configured a puppet 7 puppet master as part of the install, it uses the default formulars in the installers to set it up, I think I can get more performance out of this to optimise the puppet runs
Because the lab is so small, I don’t think from my monitoring the database is causing any issues, it’s on the same host, using close to no resources and the database is tiny and query profiling shows great response times
I could probably benefit to some tweaks to puma but because it’s so small a host with such a small environment I’m not sure how to best size it.
While the machine is small and the environment is small, resource utilisation is interesting with approx 4.5GB of ram in use (excluding disk caching) and 8-15% cpu spiking for the puppet master java process.
disk IO is close to non-existent
the puppet master startup options are pretty default and small resource wise
/usr/bin/java -Xms2G -Xmx2G -Dcom.redhat.fips=false -Djruby.logger.class=com.puppetlabs.jruby_utils.jruby.Slf4jLogger -XX:ReservedCodeCacheSize=512m -XX:OnOutOfMemoryError=kill -9 %p -XX:ErrorFile=/var/log/puppetlabs/puppetserver/puppetserver_err_pid%p.log -cp /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar:/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/facter.jar:/opt/puppetlabs/server/data/puppetserver/jars/* clojure.main -m puppetlabs.trapperkeeper.main --config /etc/puppetlabs/puppetserver/conf.d --bootstrap-config /etc/puppetlabs/puppetserver/services.d/,/opt/puppetlabs/server/apps/puppetserver/config/services.d/ --restart-file /opt/puppetlabs/server/data/puppetserver/restartcounter
really keen to see if I can push this hardware better with a more optimised config , my big nodes are great, but I’m struggling to get a good balance on a small home lab