Job-invocation API problem?

mhjacks · April 12, 2024, 1:47pm

Problem:
When trying to manage recurring logics via Foreman Ansible modules, the second addition of the same job fails.

Expected outcome:
From an Ansible standpoint, an attempt to create the same job invocation with the same arguments should return “ok” and not fail.

Foreman and Proxy versions:
Foreman 3.10

Foreman and Proxy plugin versions:
Katello 4.12

Distribution and version:
AlmaLinux 8

Other relevant data:
I reported the issue against upstream FAM here: job_invocation module seems not to be properly idempotent · Issue #1723 · theforeman/foreman-ansible-modules · GitHub

Example playbook:

---
- name: Katello job test
  hosts: localhost
  connection: local
  become: false
  gather_facts: false
  vars:
    organization: 'Default Organization'
    katello_username: admin
    katello_password: changeme
    server_url: https://foreman.example.com
    validate_certs: false
  tasks:
    - name: "Set up programmatic job invocations (failures ok)"
      theforeman.foreman.job_invocation:
        username: '{{ katello_username }}'
        password: '{{ katello_password }}'
        server_url: '{{ server_url }}'
        validate_certs: '{{ katello_validate_certs }}'
        command: 'dnf upgrade -y'
        job_template: 'Run Command - Ansible Default'
        recurrence:
          cron_line: '05 12 * * *'
          purpose: 'System update'
        search_query: 'name ~ .'
        targeting_type: dynamic_query

Example output:

TASK [Set up programmatic job invocations (failures ok)] *******************************************
fatal: [localhost]: FAILED! => {"changed": false, "error": {"message": "Validation failed: Triggering: Purpose Active or disabled recurring logic with purpose System update already exists"}, "msg": "Error while performing create on job_invocations: 500 Server Error: Internal Server Error for url: https://srv-katello.imladris.lan/api/job_invocations"}

aruzicka · April 12, 2024, 2:23pm

This sort of makes sense. The API is more similar to running a command rather than declaring a command to be run. The fact that it can be used for both makes it troublesome. To me it would make most sense to resolve this in the ansible modules. I’ll keep an eye out for the issue you opened there and we’ll see where it takes us.

mhjacks · April 23, 2024, 2:51pm

Well, according to API documentation there isn’t a DELETE method supported on job_invocations, which seems pretty problematic for a pure ansible solution to the problem.

aruzicka · April 23, 2024, 5:13pm

That is sort of by design. Same as tasks, they can’t be deleted easily as they provide a paper trail of what went on and could break things if you removed them at a wrong time.

We could talk about specifics of what the api does or does not have, but that would not address the underlying issue of how it was meant to be used and how ansible modules try to use it. The API “fires” job. You call the API, it fires the job and the job runs. Ansible modules try to go the declarative way. This sort of works for non-recurring jobs. You declare the job, it runs and that’s it. If you do it again, it runs again. If you add recurring jobs to the mix, it works the same way. You declare the job, it runs and eventually it runs again. A key point to note here is that the second repetition is not the same job. It is a new one cloned from the original. So even if we did expose deletion, it wouldn’t really help anything. Adding purpose to recurring jobs serves as a safeguard rather than as a unique identifier by which jobs could be managed.

mhjacks · April 23, 2024, 5:36pm

Recurring jobs, it seems are not the same kind of thing as other jobs. Maybe they need a separate API to manage them? I don’t see this is a declarative/imperative thing, it seems to be a much more fundamental question of joining two things as one that really seem to be very different.

Since it doesn’t obviously make sense (for example) to delete a non-recurring job_invocation, but it definitely makes sense to delete a recurring one.

Maybe we’re saying the same thing here?

lumarel · April 23, 2024, 6:55pm

We also ran into that a while ago, and just opted to make the creation of the recurring logics a opt-in switch for Ansible, of course this is not really a solution, but the fastest one to create back then.

I think right now the better option is to query the currently queued jobs and then look if the RL already exists, aka the api version of hammer --csv --no-headers job-invocation list --organization-id <id> --search "status = queued"

mhjacks · April 29, 2024, 8:06pm

@aruzicka I think this is why it might make sense to split out the recurring jobs into another API. A recurring job is not at all the same kind of thing - it’s a pattern that does auto-scheduling. It doesn’t make sense to have a delete for a normal job, but it does for a recurring one, since that will (presumably) stop future ones from being scheduled.

I don’t know enough about the internals to make a judgment as to whether this is easy to do, or whether it would cause problems to create new routes for recurring jobs. But it’s really hard to see how we could solve this problem completely in foreman-ansible-modules because it’s not clear how we would either remove, or change/update existing recurring jobs without the routes to do so.

What do you think?

aruzicka · April 30, 2024, 9:16am

True, recognizing a “recurring job” as a first class citizen would definitely help over the current “regular, sort of ephemeral, job with recurring bolted on” model.

Not necessarily the routing parts, but as of now, once you fire a job (even a recurring one) it is mostly set in stone, so that would be the more complicated aspect of it.

If we were talking about recurring jobs with purpose, then you would have to try creating it and either it would succeed or fail. If it would fail, then you’d have to find the recurring logic which made the creation fail, cancel the recurring logic and create the job again.

If we broaden the scope to recurring jobs then I’m afraid this would become borderline impossible as we don’t have any means of tying the parameters passed in to objects that may already exist in foreman.

mhjacks · April 30, 2024, 3:53pm

aruzicka:

mhjacks:

A recurring job is not at all the same kind of thing - it’s a pattern that does auto-scheduling.

True, recognizing a “recurring job” as a first class citizen would definitely help over the current “regular, sort of ephemeral, job with recurring bolted on” model.

Yes! This is the main thing I’m hoping for here. It seems like the semantics for GET and DELETE would be straightforward, the tricky bits would be in/on POST.

mhjacks:

I don’t know enough about the internals to make a judgment as to whether this is easy to do, or whether it would cause problems to create new routes for recurring jobs.

Not necessarily the routing parts, but as of now, once you fire a job (even a recurring one) it is mostly set in stone, so that would be the more complicated aspect of it.

Fair.

mhjacks:

But it’s really hard to see how we could solve this problem completely in foreman-ansible-modules because it’s not clear how we would either remove, or change/update existing recurring jobs without the routes to do so.

If we were talking about recurring jobs with purpose, then you would have to try creating it and either it would succeed or fail. If it would fail, then you’d have to find the recurring logic which made the creation fail, cancel the recurring logic and create the job again.

If we broaden the scope to recurring jobs then I’m afraid this would become borderline impossible as we don’t have any means of tying the parameters passed in to objects that may already exist in foreman.

Sure. It would be really messy to get into all the weird permutations possible (i.e. changing from or to cron scheduling with the same purpose). Would it be reasonable to have this proposed entity (internally) delete the item with matching purpose and inject its own contents on a POST? Any existing jobs related to the original should be cancelled, I think.

And presumably we would deprecate the use of recurring logics with the job_invocation routes if this were to become available.

The question is - is this too much of a violation of user expectations and/or the REST model? I would argue that there is significant benefit in being able to manage these kinds of recurring logics in broad terms; even if there might be some complexities in managing state in the details. This would at least make it possible for FAM to build something to use it (which is my primary goal here, at least).