some years ago, Ivan identified major issue with our orchestration based on ActiveRecord. Specifically, we cannot use SQL explicit transactions which makes most of our actions fragile when it comes to concurrency.
For example during automated host discovery, Foreman must find matching rule, find amount of hosts already provisioned by this rule, compare this to the rule limit and initiate provisioning. When multiple hosts are being provisioned, this indeed runs into issues and as a result, rules with limit set are overprovisioned. This was articulated in:
I can find many other examples where a simple SQL transaction would solve the problem, however our orchestration framework. So couple of topics for this thread.
First. Shall we start discussing a different approach than abusing ActiveRecord to do orchestration? This would be probably huge effort, most of our plugins use orchestration framework. For the record, we have a simple framework defined in
app/models/concerns/orchestration.rb which provides two queues: queue and post_queue. These are just list of methods to be called on around_save and after_commit ActiveRecord hooks. This is indeed easy to understand for Rails developers, however this is not ideal.
Second. For the particular issue I am trying to solve (discovery and post commit), it looks like removing post_queue would help. It does not appear to be widely used: basically only SSH provision and PuppetCA in core use it, then I was able to find Discovery plugin. I wonder what was the reason to implement this queue as a separate one - Amos moved it 8 years ago without much commenting about it, that’s all I can read from the git log. All those actions could be high-priority normal queue tasks.
Any kind of change in this regard will likely affect many plugins, but I would love to at least solve the post_queue problem which would enable us to use more SQL explicit transactions. We learned to ignore them over the time because of this specific problem (I hope it’s just this and not more) and many of our calls are not concurrent-friendly.