Dynflow and high availability

Problem:
In the environment Foreman is deployed on multiple instances for high availability and we experienced a problem with tasks not being picked up. When I stopped the dynflow-sidekiq services on the second node and restarted them on the first node, it picked up all the overdue tasks. This started me investigating further and I found Dynflow workers scaling · GitHub where it explicitly tells us to not change the orchestrator to have multiple threads. I also found the sidekiq dashboard at https://foreman.example.com/foreman_tasks/sidekiq/busy and it does nicely list the workers, but when I started all services on the second node again only worker and worker-host-queue appeared. So my question is: Should there be only one instance of the orchestrator in a high available environment? If this is the case, is a manual failover the way to go here? Are there any problems to expect if the other workers are running on multiple machines?

Expected outcome:
Some more knowledge about dynflow and high availability

Foreman and Proxy versions:
2.3.5

Foreman and Proxy plugin versions:
Only the possibility relevant ones
foreman-tasks 3.0.6
foreman_remote_execution 4.2.2

Distribution and version:
Red Hat Enterprise Linux Server release 7.9

Other relevant data:
I hope @aruzicka has some insights for me. Thanks in advance!
And CC @laugmanuel as we discussed this.

How did the deployment look like? Which services were running on the nodes? What is the desired state? Do you want to be able to do failover from one node to another or do you want an active-active kind of setup?

Before we moved to the sidekiq based implementation, we had our own homegrown executor. An executor contained a part which decided what and when should be executed, workers which executed the jobs and a mechanism using which the deciding part could talk to the workers.

The homegrown executor was a single process containing everything. After the move, we ended up with a bunch of smaller services, but the executor still exists as a concept. The part that was controlling what should happen when became the orchestrator, instead of in-process communication we have redis and the rest became the workers.

To rule out out-of-order execution there has to be a single orchestrator per executor. You can have multiple orchestrator processes running, but they’ll operate in passive-active mode. First orchestrator to start will acquire a lock on redis and all other orchestrators will wait on the lock. Sadly this works only at process level, so bumping concurrency for an orchestrator would break things as we have no way to enforce order of execution.

In large deployments I could see two kinds of setups.

Option 1 - multiple executors

Each node runs a full executor (orchestrator+redis+workers). In this setup, all the executors run in active-active mode so the workload is spread across them. Sadly the algorithm used is fairly basic and in the end does not really spread the load evenly.

Option 2 - One big executor

Each node runs an orchestrator and workers, but all of them use the same redis. In this setup, the orchestrators work in failover mode and all workers are always active. I don’t have any data to back this claim up, but I’d say this should lead to better utilization of available resources.

The downside of this model is that the currently active orchestrator becomes a bottleneck. Once you fully saturate the orchestrator and it starts to negatively impact job throughput, you need to add another executor.

Option 3 - Mix and match

This is just a combination of the previous two. You don’t have to limit yourself and can have multiple big executors.

Q&A

Did you have it configured to use multiple threads?

It sounds like you went with option 2. In this case, the orchestrator from the other node will not show up until it becomes active.

You can have as many of them as you want, but each of them has to have only one thread. All the processes will work in failover mode. There’s no need to control the failover by hand.

Just the usual ones when you talk to a service over the network. If the connection is solid, it should be fine.

If you can reproduce this, could you check a couple of things for me? Do the jobs get picked up by the orchestrator, put on redis and then get stuck there (ie the workers are not picking the jobs up) or do they “get stuck” even before reaching redis? The sidekiq console you discovered should help shed some light into this.

4 Likes

Thanks, this answers my question very well, so the setup should be fine.

Yes, we are having Option 2, so full executor on both nodes and one external redis. One thread for orchestrator, 5 for the worker, 5 for worker-hosts-queue, so these settings are default.

We will have a closer look at the setup during the next weeks and will update you if the problem reappears.

Thanks @aruzicka for the detailed explanation!

I have just one more thing to add which we needed to modify in our setup:
Currently, the sidekiq services are started using the default systemd unit provided by Foreman.
This unit uses dynflow-sidekiq.rb to launch the actual instances (workers, orchestrator).
Given the Redis locking mechanism for the orchestrator, we needed to patch the code to send SdNotify.ready before ::Rails.application.dynflow.initialize! so systemd sees the process as running. Otherwise, the code will block at initialize! and systemd will kill the service after the timeout.
As we also use the Puppet modules directly, this results in Puppet starting the service again during it’s next run which basically ends in a neverending loop.

1 Like

That’s an interesting point. Could you file an issue in our redmine for it? It would be ideal if we could mark the process as active either when it is really fully active or when it goes passive. Marking it as ready before ::Rails.application.dynflow.initialize! feels a bit premature. If it goes active, it does a few more things before it becomes really active and can start doing things.

1 Like

I’ve opened an issue here: Bug #33124: Sidekiq Orchestrator gets killed by systemd if Redis lock can't be aquired - Foreman
I hope the description makes sense.

1 Like