Smart_proxy_dynflow 0.8 breaks smart_proxy_ansible in nightly

Ohai!

Recently, sp-dynflow 0.8.0 (and later 0.8.1) landed in nightly and currently breaks Ansible :frowning:

Looking at the log, I see:

2022-03-31T12:04:57 64d60934 [I] Started POST /dynflow/tasks/launch 
2022-03-31T12:04:57 64d60934 [I] Finished POST /dynflow/tasks/launch with 200 (34.33 ms)
2022-03-31T12:04:57 64d60934 [E] <ArgumentError> unknown keyword: :id
        /usr/share/gems/gems/smart_proxy_ansible-3.3.1/lib/smart_proxy_ansible/runner/ansible_runner.rb:13:in `initialize'
        /usr/share/gems/gems/smart_proxy_dynflow-0.8.1/lib/smart_proxy_dynflow/action/batch_runner.rb:11:in `new'
        /usr/share/gems/gems/smart_proxy_dynflow-0.8.1/lib/smart_proxy_dynflow/action/batch_runner.rb:11:in `initiate_runner'
        /usr/share/gems/gems/smart_proxy_dynflow-0.8.1/lib/smart_proxy_dynflow/action/runner.rb:45:in `init_run'
        /usr/share/gems/gems/smart_proxy_dynflow-0.8.1/lib/smart_proxy_dynflow/action/runner.rb:12:in `run'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/action.rb:582:in `block (3 levels) in execute_run'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/middleware/stack.rb:27:in `pass'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/middleware.rb:19:in `pass'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/action/progress.rb:31:in `with_progress_calculation'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/action/progress.rb:17:in `run'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/middleware/stack.rb:23:in `call'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/middleware/stack.rb:27:in `pass'
        /usr/share/gems/gems/dynflow-1.6.4/lib/dynflow/middleware.rb:19:in `pass'

Looking at recent sp-ansible commits, I would guess it’s fixed by Fixes #34585 - Process artifact files on demand · theforeman/smart_proxy_ansible@d47ed7a · GitHub but this didn’t land in nightly yet. Downgrading sp-dynflow to 0.7.0 fixes the issue.

Breakage in nightly happens, that’s what nightly is for, nevertheless I found a few oddities while investigating this, and would like to discuss them.

  1. This was known to break things – can we somehow ensure packages that need to be updated together are somehow tracked and not land uncoordinated? Maybe even adding Conflicts on the package, so they are not accidentally cherry-picked?
  2. Our CI (well, the plugins and luna pipelines) execute hammer job-invocation create for REX and Ansible and hung there for hours (before Jenkins aborted the job), shouldn’t hammer error out if the job failed to launch or takes too long to execute?
  3. The jobs are marked as running/pending in hammer job-invocation list, which probably explains the above waiting – shouldn’t Foreman notice the error on the Proxy?

Thanks
Evgeni

This one is on me, I got distracted by other things while in the middle of releasing, sorry about that. I’ll try to be more diligent.

Any suggestions how without creating additional overhead?

Not by default, but maybe we could pass --execution-timeout-interval with a sane value to make it kill the job if it does not finish on its own within the given time interval.

In ideal world yes, but concurrency is hard. In the past we made some tradeoffs to have better performing happy path at the cost of making unexpected failures go mostly unnoticed.

In a gem spec, there is no way to declare a “breaks” relationship, right? That way the proxy wouldn’t have started and we’d seen a clear error there.

I don’t think so. I could have probably recognized that as a breaking changed and released it as 1.0.0 and then sp-ansible would not be able to find a compatible version as it depends on ~> 0.8.