Thanks @aruzicka for the detailed explanation!
I have just one more thing to add which we needed to modify in our setup:
Currently, the sidekiq services are started using the default systemd unit provided by Foreman.
This unit uses dynflow-sidekiq.rb to launch the actual instances (workers, orchestrator).
Given the Redis locking mechanism for the orchestrator, we needed to patch the code to send SdNotify.ready
before ::Rails.application.dynflow.initialize!
so systemd sees the process as running. Otherwise, the code will block at initialize!
and systemd will kill the service after the timeout.
As we also use the Puppet modules directly, this results in Puppet starting the service again during it’s next run which basically ends in a neverending loop.