Hello,
TL;DR:
This is a write-up for today's demo on orchestration-NG (calls to Pulp/Candlepin/Smart proxies)
and bulk actions using a library called Dynflow. You can find a link to the
demo of current status and code, as well as some explanations and furthers
steps of stabilizing back-end code in Katelo and Foreman.
Intro
···
-----We started this effort some time ago to stabilize the way
how Katello calls to underlying backends (Pulp, Candlepin…).
The goal is to improve the back-end code to be able to have more
control over the whole process of propagation of changes between Katello
and theses services, including:
- seeing the progress as it happens
- being able to resume the process when something goes wrong
- being able to extend the process from third-party libraries (a.k.a.
engines) - being able to run the steps concurrently if needed (only part of the
steps are really dependent on each other, the rest can be preformed
concurrently: for tasks such as calling external services via HTTP
or messaging, the performance should gain a lot with this. - leveraging this effort to handle bulk actions as well, as the
requirements are really similar here
We try to do it in a way that keeps the “business” logic as
independent from implementation detail how the process is run at the
end, and many things can be decided based on the use case, such as:
- if the execution happens in the same process as the web server, if
there is a thread per workflow, or there is a thread-pool of workers
handling the separate steps of workflow. The architecture is done in
a way that it makes it possible to distribute the flow over
messaging etc (although we don’t have that implemented yet) - where the process is serialized (memory, file, database …)
- etc.
The work was done so far especially on back-end side, that can
be used in UI/API to provide some nicer interface. The user
interface that is provided right now is intended more fore developers/
super-admins.
I’m sending this also to Foreman list, as the orchestration on
Foreman side is done similar to the Katello’s one and the demo
show’s the usage of Dynflow in Foreman as well.
Current status
I’ve demoed the current status here:
http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104
In this hangout, I’m going though these use-cases:
- end2end scenario from organization creation to content view
definition publishing - concurrent execution of independent steps of process.
- ability to resume the process on failure
- usage in Foreman to perform the smart-proxy calls to create DHCP and
DNS records, being able to chain this actions (DNS action uses the IP that
is was produced in the DHCP action) (including ability to resume
after error) - ability to use this approach to make the process extendable from
third-party code. - ability to use it for bulk actions
As it’s a live demo, you will also have a chance to see /me debugging
provisioning issues, that was not planned as part of this recording:)
For those intersted in the code, here it is:
-
Katello with part of ochestration rewritten using Dynflow:
https://github.com/iNecas/katello/tree/orchestration-ng
bundle install
andrake db:migrate
(to prepare the database
storage for process serialization), one can useend2end.sh new_org_name
to perform the actions that are using the new
orchestration tool.We have ignored the output errors for now, as we focused on proper
calls to underlying services. I’ve also turned off the automatic elastic
search indexing as indexing should happen after the orchestration is
done (and in the parts already rewritten, it’s really done as part
of orchestration as finalization step)To control the orchestration, there is a web console available on
/katello/dynflow
: it allows to see the current status of
orchestration as well as ability to resume process or skip some
steps when failing -
the Foreman code with some use of Dynflow is available here:
https://github.com/iNecas/foreman/tree/orchestration-ng
bundle install
should be enough here, as we use file storage for
processes serialization (just to show he have it:)I’ve modified the host creation code to use Dynflow for DHCP/DNS
related calls to foreman-proxy. I’ve ignored things like collisions
control for now.Also it’s loads a sample extension app:
https://github.com/iNecas/foreman-ext
that uses Dynflow to hook into the main Foreman process to perform
some additional tasks (logging messages about actions to
log/actions.log
). It hooks to host and domain creation process. -
the Dynflow code used for the demo is here:
https://github.com/iNecas/dynflow/tree/kafo
It’s based on this branch witch some tweaks that were not pushed
into upstream yet.https://github.com/iNecas/dynflow/tree/refactor
The refactor branch contains the code cleanup that we did after
having first quick-and-dirty prototype some time ago and will be
merged into master after some more polishing (see next steps)
Next Steps
We have still bunch of things to do:
-
better README and documentation for Dynflow
-
meta information for the workflows to be able to filter and sort by result,
state, user, task type etc (ideally extendable enough for developers
to specify their own criteria) -
ability to suspend execution of some steps when waiting for external
response (via messaging response or polling on the status of
external task). For now for example, when synchronizing the
repository in Pulp, the polling on sync status happens in the worker
itself, which blocks a worker thread in the pool: we want to extract
this different thread that would handle the polling in general and
resume the step if something to do occurs. Similarly for messaging. -
ability to distribute the execution though messaging (depends on
previous point) -
API for specifying rollbacks for some actions: if a workflow has all
actions with rollbacks defined, it should be possible to rollback
the whole flow. -
Rest API/CLI for the dynflow to be able to control and automate some
operations
and more based on the feedback.
On Katello/Foreman side, there should be a logic of “soft locks”
implemented: a resource can’t be fully used until the task that
performs it’s orchestration finishes and user actions should be
prevented till everything is propagated: today there isn’t demand
for this as the resource doesn’t get into database till everything is
done.
I’m sorry for a long mail, I’ve tried not to get into verbose mode And
I’m pretty sure I forgot about something.
Any feedback appreciated. I’m also more than happy to provide more explanation
and background that lead into current implementation.
Special thanks to pitr-ch that helped at lot with refactoring and
parallel executor, as well as jsherrill and ehelms that provided
helpful feed-back and patches.
– Ivan