Foreman Dynflow Pool

State in Foreman 2.0

In Foreman 2.0 we’ve introduced a refactoring of Dynflow to use Sidekiq. This has allowed us to introduce separate services:

  • dynflow-sidekiq@orchestrator - special worker with a single thread
  • dynflow-sidekiq@worker - worker that by default has 5 threads and monitors the default and remote_execution queue
  • dynflow-sidekiq@worker-hosts-queue - Only on Katello - monitors the hosts_queue queue (yes, the naming looks redundant). Configured by the installer and shares the threads variable with worker.

In practice this can be configured (values are defaults):

foreman-installer \
  --foreman-dynflow-pool-size 5 \
  --foreman-jobs-manage-service true \
  --foreman-jobs-service-ensure running \
  --foreman-jobs-service-enable true \
  --foreman-jobs-sidekiq-redis-url undef
  • --foreman-dynflow-pool-size 5 configures the threads for a worker. In Katello you have 2 worker processes which means 10 threads in total.
  • --foreman-jobs-manage-service true tells the installer whether to manage the services or not. Generally not advisable, but can be used to work around installer limitations.
  • --foreman-jobs-service-ensure running ensures the state of the jobs services. Can be changed to stopped. This may be useful on setups where you run the job services on different physical servers
  • --foreman-jobs-service-enable true similar to the ensure parameter, but this is for enable/disable starting at boot. Should generally match the ensure
  • --foreman-jobs-sidekiq-redis-url undef can be used to point to a non-local Redis instance. Useful in load balanced setups or other split setups.

Note there is no knob to change the number of threads independently for worker and worker-hosts-queue.

Adding a pool

Threading in Ruby has its limitations and at some point you may need to use multiple processes. The systemd job definition easily allows this, but the question is how we configure this.

The idea is that you have dynflow-sidekiq@worker-n. A concrete example:

  • dynflow-sidekiq@orchestrator has 1 thread
  • dynflow-sidekiq@worker-1 has 5 threads
  • dynflow-sidekiq@worker-2 has 5 threads
  • dynflow-sidekiq@worker-hosts-queue-1 has 5 threads
  • dynflow-sidekiq@worker-hosts-queue-2 has 5 threads

I started a PR to implement this just to test out the likely issues:

It turns out my idea generally works, but there are some issues we should discuss.

First of all, bike shedding around naming.

Worker-n

Do we always name them worker-n or do we use worker + worker-1 + worker-2.

My suggestion is to always use worker-n and start 1-based.

The implication is a migration from 2.0 where we had worker. This should also be changed in packaging which ships a default worker.yml file. While the installer can and will remove this config file, any yum update will replace it since yum isn’t smart enough to track file deletions (apt is).

For reference, Pulp 3 also uses 1-based worker names according to Instructions — Pulp Project 3.36.0 documentation.

Parameters

The current PR only implements a parameter --foreman-dynflow-pool-size and keeps --foreman-dynflow-workers. This may be confusing. Especially if Katello doesn’t have its own tuning available.

What would users expect here?

Timeline

While it’s very late, I would like to suggest this for Foreman 2.1. The actual changes are not large and it either deploys + runs the service, or it doesn’t. Sidekiq workers themselves require no code changes. That should make it fairly safe.

This matched my thoughts when I was looking at the same issue for tuning the number of workers for Katello. I think the consistency with Pulp 3 is good too.

If Foreman has a --foreman-dynflow-workers then I think it makes sense for Katello to have a similar tuning argument for its hosts_queue.

From the perspective of a user who doesn’t know that pool size relates to Dynflow worker thread count, I suppose it would be helpful to rename --foreman-dynflow-pool-size.

--foreman-dynflow-thread-count or --foreman-dynflow-concurrency maybe?

I like concurrency as it also feels like a general term that most users can grok. Threads seem too low of a level.

Note we need to ensure that the DB pool size is adjusted based on the higher of Puma thread count or a Dynflow workers concurrency to ensure enough connections are present.

I don’t think we should hide this from users. You already expose a parameter for instances and one for threads. That means you have 2 options that both affect the actual concurrency. You end up with something like:

foreman-installer \
  --foreman-dynflow-worker-instances n \
  --foreman-dynflow-worker-concurrency m

The first one is the number of workers to spawn and the second one the concurrency we configure for each worker. You could be verbose and name it --foreman-dynflow-per-worker-concurrency but that hurts the discoverability.

We are also going to be setting the concurrency per worker to be the same with this method. This is a good naive first approach. We will have to keep an eye towards results from performance testing. As I understand it we a three-point configuration matrix with which to tune:

  • queues handled by a worker
  • concurrency of a worker
  • number of workers

Command line parameters feels like an ineffective way to manage and configure this. Being able to give a yaml file as input or the use of the interactive installer feels better from a user perspective for creating and tuning the worker setup. I realize this is a bit more of an advanced use case.

This is not part of the current design.

What I now ended up with is:

--foreman-dynflow-manage-services true         # Whether to manage the dynflow services
--foreman-dynflow-orchestrator-ensure present  # The state of the dynflow orchestrator instance
--foreman-dynflow-redis-url undef              # If set, the redis server is not managed and we use the defined url to connect
--foreman-dynflow-worker-concurrency 5         # How many concurrent jobs to handle per worker instance
--foreman-dynflow-worker-instances 1           # The number of worker instances that should be running

(help texts have a lot of room for improvement)

In the case of Katello, it would likely have separate parameters for its workers.

I disagree, but let’s keep this out of the discussion and stay on topic.

Maybe we should drop this and normalize to these parameters for all workers? The current Foreman workers that are deployed are configured to handle a plugin’s queues already.

They’re not. dynflow-sidekiq@workers is configured for the default and remote_execution queue. Katello sends various jobs to the hosts_queue and dynflow-sidekiq@worker-hosts-queue is configured to look at that queue. If that service is not running, Katello jobs will remain in their queue.

It’s probably fine to let katello workers copy the values for the workers by default, but I think you want parameters to tune that separate.

So there’s also a setting pool_size under dynflow in Foreman’s settings.yaml which may also be confusing.

@aruzicka is that the database pool size? Should this be >= concurrency?

Sorry for not being specific enough. This is what I meant by currently supports a plugin (not all plugins). In that, the workers we deploy today as part of a Foreman installation are already handling a plugin’s queues that may or may not be present on a user’s system. Hence, my wondering was if we should just do this for all the queues and have a single set of input parameters to configure the number of workers.

Dynflow should be able to set the db pool size to a safe value (currently it is $concurrency + 5 iirc), however we may need to look into it for puma.

Originally (before @ekohl put this together) we (me and @ezr-ondrej) were thinking about having this achievable by providing a custom hiera. The original request for this was to have the three point config matrix @ehelms described. It is currently doable by hand, it would be a pity if we exposed only a subset of it in the installer.

I’d be fine with this iff this was the default, but we still had an option to do the more advanced configurations. Having one worker template and only scaling threads and processes feels to simplified.

This was used to tune the number of threads the executor had in the old model. Looking at the code now it doesn’t seem to be used for anything anymore.

The abstractions with worker and pool in the module are (IMHO) good. They have all the parameters and we can easily expose those as hashes. However, I don’t think it should be a “normal” workflow that many users follow.

Ok, then I’ll modify the PR to drop it.

Considering we are already building RC2 this week and this proposal is still WIP, I would strongly prefer not pulling it into 2.1.
Let it mature properly in nightlies and make sure we aren’t missing something, in upgrades, tuning, integrations with puma/systemd or elsewhere.