Allocating resources to Foreman/Katello services, without oversubscribing the system

Ohai,

in the past, we had several reports that Foreman (and especially in Katello-enabled setups) uses too much memory and OOM kills happen (see e.g. Katello 4.5/Foreman 3.3 memory leak (in gunicorn)). While it is clear that we should aim to fix all possible memory leaks, investigating this possible solutions (and workarounds) for them and monitoring existing setups, we also realized that our default deployments are not optimal as they effectively oversubscribe the available system resources.

A short example: We assume that the Puma/Rails/Foreman processes consume about 1GB of memory each, and deploy as many “workers” as would fit into system memory (subtracting a small “reservation”). [Yes, if you look closely at puppet-foreman you’ll notice it also considers CPU count, but let’s ignore that for a moment.] If you turn this around, it means whatever “reservation” we put into that formula (today it’s 1.5GB), this is all the memory that all the other processes are allowed to use, as there will be a Puma worker for each other available gigabyte of system memory.

In reality, each Puma worker uses less memory, and shares quite a bit of it with the main process (as we use app prealoading, so the application code only exists once in memory), but it’s still somewhere in the 4-600MB ballpark, which on my 16CPU/32GB machine means with 24 Puma workers (the 24 is due to CPU influencing the formula) something like 14GB “real” usage (and 24GB “allowed”).

If Puma would be the only thing running on that machine, it would be perfectly fine.
But we also have to run PostgreSQL, Tomcat, Pulp and a few others here.

Tomcat is happy with 1-2GB (napkin: 32GB system - 24GB puma allowed - 2GB tomcat = 6GB “free”).
PostgreSQL, if not tuned too much, is also happy with 2-4GB (napkin: 6GB “free” above - 4GB postgresql = 2GB “free”).
Pulp consists of 3 distinct parts: content app (very light, just serves bytes to users, can easily fit in 1G), api (that’s the part leaking in the link above) and workers. Both API and workers can consume a lot of memory, but we have (on paper) only 2GB left. And this is exactly where the Linux kernel starts not liking us. :crying_cat_face:

But Evgeni, why are you writing all that text for a problem we’re already aware about?

Well, I want to fix it. Or at least tame it a bit!

On the one hand, we’ve been trying to make the memory leak in Pulp API less severe (see Api server memory leak? - Support - Pulp Community for some details and progress).
But even if the memory leak wouldn’t exist, we today deploy most of the services (Puma, Pulp API and Pulp Workers especially, as those are the most resource hungry ones) in a configuration that is suited for “one service per VM” deployments, but not for “multiple services per VM”.

The idea is to reduce the “visible system resources” when the individual parts of the stack perform their default sizing calculation. So, given the 16C/32G machine above, when configuring Puma, we’d pretend it’s supposed to configure for 8C/16G (or something similar) instead, thus leaving more room for the other services. Users will still be able to override those decisions (as they are today) with installer options, thus explicitly configuring the number of workers each individual service should run, but the defaults would become more sustainable.

Now, the idea might sound simple, the implementation will most probably not be that.
Today, most services are configured by individual Puppet modules, not knowing anything about each other (as they ultimately could even run on different systems). Those modules are tied together using our installer and this is also where I think I’d try to inject the logic of finding out which services are available (that’s not a static list, some people run PostgreSQL on a dedicated host, some run Foreman without Katello/Pulp/Candlepin, etc) and which slice of the system to present to those.

Inside the individual modules, it could then look something like this (based on puppet-foreman, with the *_allowed_* parameters being calculated by the installer):

$_puma_cpu = pick($foreman::foreman_service_allowed_cpus, $facts['processors']['count'])
$_puma_ram = pick($foreman::foreman_service_allowed_ram, $facts['memory']['system']['total_bytes']/(1024 * 1024 * 1024))
$puma_workers = pick(
    $foreman::foreman_service_puma_workers,
    floor(
      min(
        $_puma_cpu * 1.5,
        $_puma_ram - 1.5
      )
    )
  )

But Evgeni, why are you writing that second wall of text, just go implement this!

I hoped to gather some feedback whether y’all think that’s a valuable idea or if you maybe have other ideas how we could provide better dynamic defaults in our deployments.

One solution I have wondered about is “dynamic” hiera to allow injecting a layer calculated by the installer, but overridden by command line parameters. I should think this can work if the hiera file is written prior to puppet being invoked and would thus avoid having to do the logic you proposed inside each and every module. Taking our current hierarchy and adjusting it with this idea:

hierarchy:
  - name: "Enforced scenario overrides"
    path: "scenario/%{facts.kafo.scenario.id}/enforced-scenario-overrides.yaml"
  - name: "Custom user answers"
    path: "custom.yaml"
  - name: "Scenario overrides"
    path: "scenario/%{facts.kafo.scenario.id}/scenario-overrides.yaml"
  - name: "Kafo Answers"
    path: "%{facts.kafo.scenario.answer_file}"
  - name: "Installer calculated answers"
    path: "installer_calculated.yaml"
  - name: "Built in"
    paths:
      - "scenario/%{facts.kafo.scenario.id}/family/%{facts.os.family}-%{facts.os.release.major}.yaml"
      - "scenario/%{facts.kafo.scenario.id}.yaml"
      - "family/%{facts.os.family}-%{facts.os.release.major}.yaml"
      - "family/%{facts.os.family}.yaml"
      - "security.yaml"
      - "tuning/sizes/%{facts.kafo.scenario.custom.tuning}.yaml"
      - "tuning/common.yaml"
      - "common.yaml"

I defer to @ekohl to tell me if I am barking up a crazy tree here.

In that case, we would do the whole “calculation” in the installer, so the one in the module will become dormant for all our prod cases and only be considered when people use the modules outside the installer context?

I tried to avoid that, as that would mean that we either

  1. have to duplicate the calculation “logic” (bah, code copies)
  2. drop it from the modules, making them less useful on their own

When I originally read this I was not thinking that you were providing the resource allocation but rather were wanting to calculate the worker count and supply that. Do we have this sort of calculation for every service? Or will we need to do a combination of worker specification for some and resources for others?

Today we have (inside the Puppet modules) dynamic sizing for Puma, Pulpcore API, Pulpcore Content, Pulpcore Workers.
All of these can be manually overridden.

I’ve also been thinking about how to solve this and like @ehelms I was leaning to Hiera.

Something we don’t use today, but totally could do is to use Hiera in modules. In Hiera we could declare foreman_puma_worker_cpu_count and default it to facts.processors.count, but then override that in the installer we can set a different value for foreman_puma_worker_cpu_count (because we know the system is being shared with multiple other systems). The actual calculation would still be in foreman/config.pp, just with a different input value.

So in practice I think it’d be something like:

---
foreman_puma_worker_cpu_count: "%{alias('facts.processors.count')}"

Then in foreman/config.pp:

# CPU based calculation is based on https://github.com/puma/puma/blob/master/docs/deployment.md#mri
# Memory based calculation is based on https://docs.gitlab.com/ee/install/requirements.html#puma-settings
$puma_workers = pick(
  $foreman::foreman_service_puma_workers,
  floor(
    min(
      lookup('foreman_puma_worker_cpu_count') * 1.5,
      ($facts['memory']['system']['total_bytes']/(1024 * 1024 * 1024)) - 1.5
    )
  )
)

In the installer we then write something to provide overridden values. I don’t think you can do arithmetic in Hiera itself, but you can write a custom Hiera backend in Ruby that does that.

I don’t know if this is cleaner, but did want to provide a middleground.

Another approach may be that for every Puppet module that needs some tuning we write a custom Hiera backend that provides the tuning recommendations and it can use other Hiera keys to reduce factors.

But doesn’t that mean that the module is now dependant on a custom variable foreman_puma_worker_cpu_count that is generated by the Installer, and thus doesn’t work properly without the installer?

Would

$_puma_cpu = pick(lookup('foreman_puma_worker_cpu_count'), $facts['processors']['count'])

work?

You can provide that value in Hiera. We never fully went through with it, but you can use data in modules. GitHub - theforeman/puppet-tftp: Puppet module for managing tftp is one module where we did that. Note how there’s hiera.yaml and a data/ directory.

There are upsides to this. AFAIK you can always lookup('module::value') even if the module is not included in your catalog.

Downsides are that for ranges (Debian >= 11, but not 9 & 10) it’s harder to maintain since you have multiple files to do that, where today’s params.pp can do it effectively. I’d also argue that having the value in a single place is easier to follow than having to look it up (manually).

Overall I’m torn if I like or dislike it. It depends I guess.

1 Like