Orchestration NG and bulk actions demo write-up

iNecas · August 5, 2013, 4:39pm

Hello,

TL;DR:
This is a write-up for today's demo on orchestration-NG (calls to Pulp/Candlepin/Smart proxies)
and bulk actions using a library called Dynflow. You can find a link to the
demo of current status and code, as well as some explanations and furthers
steps of stabilizing back-end code in Katelo and Foreman.

Intro

···

-----

We started this effort some time ago to stabilize the way
how Katello calls to underlying backends (Pulp, Candlepin…).
The goal is to improve the back-end code to be able to have more
control over the whole process of propagation of changes between Katello
and theses services, including:

seeing the progress as it happens
being able to resume the process when something goes wrong
being able to extend the process from third-party libraries (a.k.a.
engines)
being able to run the steps concurrently if needed (only part of the
steps are really dependent on each other, the rest can be preformed
concurrently: for tasks such as calling external services via HTTP
or messaging, the performance should gain a lot with this.
leveraging this effort to handle bulk actions as well, as the
requirements are really similar here

We try to do it in a way that keeps the “business” logic as
independent from implementation detail how the process is run at the
end, and many things can be decided based on the use case, such as:

if the execution happens in the same process as the web server, if
there is a thread per workflow, or there is a thread-pool of workers
handling the separate steps of workflow. The architecture is done in
a way that it makes it possible to distribute the flow over
messaging etc (although we don’t have that implemented yet)
where the process is serialized (memory, file, database …)
etc.

The work was done so far especially on back-end side, that can
be used in UI/API to provide some nicer interface. The user
interface that is provided right now is intended more fore developers/
super-admins.

I’m sending this also to Foreman list, as the orchestration on
Foreman side is done similar to the Katello’s one and the demo
show’s the usage of Dynflow in Foreman as well.

Current status

I’ve demoed the current status here:

http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104

In this hangout, I’m going though these use-cases:

end2end scenario from organization creation to content view
definition publishing
concurrent execution of independent steps of process.
ability to resume the process on failure
usage in Foreman to perform the smart-proxy calls to create DHCP and
DNS records, being able to chain this actions (DNS action uses the IP that
is was produced in the DHCP action) (including ability to resume
after error)
ability to use this approach to make the process extendable from
third-party code.
ability to use it for bulk actions

As it’s a live demo, you will also have a chance to see /me debugging
provisioning issues, that was not planned as part of this recording:)

For those intersted in the code, here it is:

Katello with part of ochestration rewritten using Dynflow:

https://github.com/iNecas/katello/tree/orchestration-ng

bundle install and rake db:migrate (to prepare the database
storage for process serialization), one can use end2end.sh new_org_name to perform the actions that are using the new
orchestration tool.

We have ignored the output errors for now, as we focused on proper
calls to underlying services. I’ve also turned off the automatic elastic
search indexing as indexing should happen after the orchestration is
done (and in the parts already rewritten, it’s really done as part
of orchestration as finalization step)

To control the orchestration, there is a web console available on
/katello/dynflow: it allows to see the current status of
orchestration as well as ability to resume process or skip some
steps when failing
the Foreman code with some use of Dynflow is available here:

https://github.com/iNecas/foreman/tree/orchestration-ng

bundle install should be enough here, as we use file storage for
processes serialization (just to show he have it:)

I’ve modified the host creation code to use Dynflow for DHCP/DNS
related calls to foreman-proxy. I’ve ignored things like collisions
control for now.

Also it’s loads a sample extension app:

https://github.com/iNecas/foreman-ext

that uses Dynflow to hook into the main Foreman process to perform
some additional tasks (logging messages about actions to
log/actions.log). It hooks to host and domain creation process.
the Dynflow code used for the demo is here:

https://github.com/iNecas/dynflow/tree/kafo

It’s based on this branch witch some tweaks that were not pushed
into upstream yet.

https://github.com/iNecas/dynflow/tree/refactor

The refactor branch contains the code cleanup that we did after
having first quick-and-dirty prototype some time ago and will be
merged into master after some more polishing (see next steps)

Next Steps

We have still bunch of things to do:

better README and documentation for Dynflow
meta information for the workflows to be able to filter and sort by result,
state, user, task type etc (ideally extendable enough for developers
to specify their own criteria)
ability to suspend execution of some steps when waiting for external
response (via messaging response or polling on the status of
external task). For now for example, when synchronizing the
repository in Pulp, the polling on sync status happens in the worker
itself, which blocks a worker thread in the pool: we want to extract
this different thread that would handle the polling in general and
resume the step if something to do occurs. Similarly for messaging.
ability to distribute the execution though messaging (depends on
previous point)
API for specifying rollbacks for some actions: if a workflow has all
actions with rollbacks defined, it should be possible to rollback
the whole flow.
Rest API/CLI for the dynflow to be able to control and automate some
operations

and more based on the feedback.

On Katello/Foreman side, there should be a logic of “soft locks”
implemented: a resource can’t be fully used until the task that
performs it’s orchestration finishes and user actions should be
prevented till everything is propagated: today there isn’t demand
for this as the resource doesn’t get into database till everything is
done.

I’m sorry for a long mail, I’ve tried not to get into verbose mode And
I’m pretty sure I forgot about something.

Any feedback appreciated. I’m also more than happy to provide more explanation
and background that lead into current implementation.

Special thanks to pitr-ch that helped at lot with refactoring and
parallel executor, as well as jsherrill and ehelms that provided
helpful feed-back and patches.

– Ivan

ohadlevy · August 7, 2013, 9:43am

if an action fails today with foreman, all tasks are rolled back and the
user is back in the edit screen, where he can see the task status decided
if he needs to change something and resubmit, somethings its an error (e.g.
proxy was done, user can start it again and submit), or a conflict (e.g.
dns ptr record left over, and the user can decide he wants to clean up the
old record etc.

how would we handle that? that would mean a whole new way of mitigating
failures right?

thanks!
Ohad

···

On Mon, Aug 5, 2013 at 7:39 PM, Ivan Necas wrote:

Hello,

TL;DR:
This is a write-up for today’s demo on orchestration-NG (calls to
Pulp/Candlepin/Smart proxies)
and bulk actions using a library called Dynflow. You can find a link to the
demo of current status and code, as well as some explanations and furthers
steps of stabilizing back-end code in Katelo and Foreman.

Intro

We started this effort some time ago to stabilize the way
how Katello calls to underlying backends (Pulp, Candlepin…).
The goal is to improve the back-end code to be able to have more
control over the whole process of propagation of changes between Katello
and theses services, including:

seeing the progress as it happens

being able to resume the process when something goes wrong

being able to extend the process from third-party libraries (a.k.a.
engines)

being able to run the steps concurrently if needed (only part of the
steps are really dependent on each other, the rest can be preformed
concurrently: for tasks such as calling external services via HTTP
or messaging, the performance should gain a lot with this.

leveraging this effort to handle bulk actions as well, as the
requirements are really similar here

We try to do it in a way that keeps the “business” logic as
independent from implementation detail how the process is run at the
end, and many things can be decided based on the use case, such as:

if the execution happens in the same process as the web server, if
there is a thread per workflow, or there is a thread-pool of workers
handling the separate steps of workflow. The architecture is done in
a way that it makes it possible to distribute the flow over
messaging etc (although we don’t have that implemented yet)

where the process is serialized (memory, file, database …)

etc.

The work was done so far especially on back-end side, that can
be used in UI/API to provide some nicer interface. The user
interface that is provided right now is intended more fore developers/
super-admins.

I’m sending this also to Foreman list, as the orchestration on
Foreman side is done similar to the Katello’s one and the demo
show’s the usage of Dynflow in Foreman as well.

Current status

I’ve demoed the current status here:

http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104

In this hangout, I’m going though these use-cases:

end2end scenario from organization creation to content view
definition publishing

concurrent execution of independent steps of process.

ability to resume the process on failure

usage in Foreman to perform the smart-proxy calls to create DHCP and
DNS records, being able to chain this actions (DNS action uses the IP
that
is was produced in the DHCP action) (including ability to resume
after error)

ability to use this approach to make the process extendable from
third-party code.

ability to use it for bulk actions

As it’s a live demo, you will also have a chance to see /me debugging
provisioning issues, that was not planned as part of this recording:)

For those intersted in the code, here it is:

Katello with part of ochestration rewritten using Dynflow:

https://github.com/iNecas/katello/tree/orchestration-ng

bundle install and rake db:migrate (to prepare the database
storage for process serialization), one can use end2end.sh new_org_name to perform the actions that are using the new
orchestration tool.

We have ignored the output errors for now, as we focused on proper
calls to underlying services. I’ve also turned off the automatic elastic
search indexing as indexing should happen after the orchestration is
done (and in the parts already rewritten, it’s really done as part
of orchestration as finalization step)

To control the orchestration, there is a web console available on
/katello/dynflow: it allows to see the current status of
orchestration as well as ability to resume process or skip some
steps when failing

the Foreman code with some use of Dynflow is available here:

https://github.com/iNecas/foreman/tree/orchestration-ng

bundle install should be enough here, as we use file storage for
processes serialization (just to show he have it:)

I’ve modified the host creation code to use Dynflow for DHCP/DNS
related calls to foreman-proxy. I’ve ignored things like collisions
control for now.

Also it’s loads a sample extension app:

https://github.com/iNecas/foreman-ext

that uses Dynflow to hook into the main Foreman process to perform
some additional tasks (logging messages about actions to
log/actions.log). It hooks to host and domain creation process.

the Dynflow code used for the demo is here:

https://github.com/iNecas/dynflow/tree/kafo

It’s based on this branch witch some tweaks that were not pushed
into upstream yet.

https://github.com/iNecas/dynflow/tree/refactor

The refactor branch contains the code cleanup that we did after
having first quick-and-dirty prototype some time ago and will be
merged into master after some more polishing (see next steps)

Next Steps

We have still bunch of things to do:

better README and documentation for Dynflow

meta information for the workflows to be able to filter and sort by
result,
state, user, task type etc (ideally extendable enough for developers
to specify their own criteria)

ability to suspend execution of some steps when waiting for external
response (via messaging response or polling on the status of
external task). For now for example, when synchronizing the
repository in Pulp, the polling on sync status happens in the worker
itself, which blocks a worker thread in the pool: we want to extract
this different thread that would handle the polling in general and
resume the step if something to do occurs. Similarly for messaging.

ability to distribute the execution though messaging (depends on
previous point)

API for specifying rollbacks for some actions: if a workflow has all
actions with rollbacks defined, it should be possible to rollback
the whole flow.

Rest API/CLI for the dynflow to be able to control and automate some
operations

and more based on the feedback.

On Katello/Foreman side, there should be a logic of "soft locks"
implemented: a resource can’t be fully used until the task that
performs it’s orchestration finishes and user actions should be
prevented till everything is propagated: today there isn’t demand
for this as the resource doesn’t get into database till everything is
done.

I’m sorry for a long mail, I’ve tried not to get into verbose mode And
I’m pretty sure I forgot about something.

Any feedback appreciated. I’m also more than happy to provide more
explanation
and background that lead into current implementation.

Special thanks to pitr-ch that helped at lot with refactoring and
parallel executor, as well as jsherrill and ehelms that provided
helpful feed-back and patches.

– Ivan

–
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

it all looks pretty promising… one question i am not sure about

iNecas · August 19, 2013, 11:16am

There are more options to deal with the failure with this approach:

by default the orchestration process is paused, letting you to fix the issue (expecting the problem was caused by some external service unavailable or something) and resume the process (it also survives restarts etc).
The second option is to define the rollbacks for the orchestration actions, and if all the steps of the orestration have rollbacks defined, the rollback is doable (it's still questionable if it's not better to go for 1. by default letting the user to choose if it's better to rollback or try to fix the problem (or having the admin to decide)

– Ivan

···

----- Original Message ----- > On Mon, Aug 5, 2013 at 7:39 PM, Ivan Necas wrote: > > > Hello, > > > > TL;DR: > > This is a write-up for today's demo on orchestration-NG (calls to > > Pulp/Candlepin/Smart proxies) > > and bulk actions using a library called Dynflow. You can find a link to the > > demo of current status and code, as well as some explanations and furthers > > steps of stabilizing back-end code in Katelo and Foreman. > > > > Intro > > ----- > > > > We started this effort some time ago to stabilize the way > > how Katello calls to underlying backends (Pulp, Candlepin...). > > The goal is to improve the back-end code to be able to have more > > control over the whole process of propagation of changes between Katello > > and theses services, including: > > > > * seeing the progress as it happens > > * being able to resume the process when something goes wrong > > * being able to extend the process from third-party libraries (a.k.a. > > engines) > > * being able to run the steps concurrently if needed (only part of the > > steps are really dependent on each other, the rest can be preformed > > concurrently: for tasks such as calling external services via HTTP > > or messaging, the performance should gain a lot with this. > > * leveraging this effort to handle bulk actions as well, as the > > requirements are really similar here > > > > We try to do it in a way that keeps the "business" logic as > > independent from implementation detail how the process is run at the > > end, and many things can be decided based on the use case, such as: > > > > * if the execution happens in the same process as the web server, if > > there is a thread per workflow, or there is a thread-pool of workers > > handling the separate steps of workflow. The architecture is done in > > a way that it makes it possible to distribute the flow over > > messaging etc (although we don't have that implemented yet) > > * where the process is serialized (memory, file, database ...) > > * etc. > > > > The work was done so far especially on back-end side, that can > > be used in UI/API to provide some nicer interface. The user > > interface that is provided right now is intended more fore developers/ > > super-admins. > > > > I'm sending this also to Foreman list, as the orchestration on > > Foreman side is done similar to the Katello's one and the demo > > show's the usage of Dynflow in Foreman as well. > > > > Current status > > -------------- > > > > I've demoed the current status here: > > > > > > http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104 > > > > In this hangout, I'm going though these use-cases: > > > > * end2end scenario from organization creation to content view > > definition publishing > > * concurrent execution of independent steps of process. > > * ability to resume the process on failure > > * usage in Foreman to perform the smart-proxy calls to create DHCP and > > DNS records, being able to chain this actions (DNS action uses the IP > > that > > is was produced in the DHCP action) (including ability to resume > > after error) > > * ability to use this approach to make the process extendable from > > third-party code. > > * ability to use it for bulk actions > > > > As it's a live demo, you will also have a chance to see /me debugging > > provisioning issues, that was not planned as part of this recording:) > > > > For those intersted in the code, here it is: > > > > * Katello with part of ochestration rewritten using Dynflow: > > > > https://github.com/iNecas/katello/tree/orchestration-ng > > > > `bundle install` and `rake db:migrate` (to prepare the database > > storage for process serialization), one can use `end2end.sh > > new_org_name` to perform the actions that are using the new > > orchestration tool. > > > > We have ignored the output errors for now, as we focused on proper > > calls to underlying services. I've also turned off the automatic elastic > > search indexing as indexing should happen after the orchestration is > > done (and in the parts already rewritten, it's really done as part > > of orchestration as finalization step) > > > > To control the orchestration, there is a web console available on > > `/katello/dynflow`: it allows to see the current status of > > orchestration as well as ability to resume process or skip some > > steps when failing > > > > * the Foreman code with some use of Dynflow is available here: > > > > https://github.com/iNecas/foreman/tree/orchestration-ng > > > > `bundle install` should be enough here, as we use file storage for > > processes serialization (just to show he have it:) > > > > I've modified the host creation code to use Dynflow for DHCP/DNS > > related calls to foreman-proxy. I've ignored things like collisions > > control for now. > > > > Also it's loads a sample extension app: > > > > https://github.com/iNecas/foreman-ext > > > > that uses Dynflow to hook into the main Foreman process to perform > > some additional tasks (logging messages about actions to > > `log/actions.log`). It hooks to host and domain creation process. > > > > * the Dynflow code used for the demo is here: > > > > https://github.com/iNecas/dynflow/tree/kafo > > > > It's based on this branch witch some tweaks that were not pushed > > into upstream yet. > > > > https://github.com/iNecas/dynflow/tree/refactor > > > > The refactor branch contains the code cleanup that we did after > > having first quick-and-dirty prototype some time ago and will be > > merged into master after some more polishing (see next steps) > > > > Next Steps > > ---------- > > > > We have still bunch of things to do: > > > > * better README and documentation for Dynflow > > > > * meta information for the workflows to be able to filter and sort by > > result, > > state, user, task type etc (ideally extendable enough for developers > > to specify their own criteria) > > > > * ability to suspend execution of some steps when waiting for external > > response (via messaging response or polling on the status of > > external task). For now for example, when synchronizing the > > repository in Pulp, the polling on sync status happens in the worker > > itself, which blocks a worker thread in the pool: we want to extract > > this different thread that would handle the polling in general and > > resume the step if something to do occurs. Similarly for messaging. > > > > * ability to distribute the execution though messaging (depends on > > previous point) > > > > * API for specifying rollbacks for some actions: if a workflow has all > > actions with rollbacks defined, it should be possible to rollback > > the whole flow. > > > > * Rest API/CLI for the dynflow to be able to control and automate some > > operations > > > > and more based on the feedback. > > > > On Katello/Foreman side, there should be a logic of "soft locks" > > implemented: a resource can't be fully used until the task that > > performs it's orchestration finishes and user actions should be > > prevented till everything is propagated: today there isn't demand > > for this as the resource doesn't get into database till everything is > > done. > > > > I'm sorry for a long mail, I've tried not to get into verbose mode :) And > > I'm pretty sure I forgot about something. > > > > Any feedback appreciated. I'm also more than happy to provide more > > explanation > > and background that lead into current implementation. > > > > Special thanks to pitr-ch that helped at lot with refactoring and > > parallel executor, as well as jsherrill and ehelms that provided > > helpful feed-back and patches. > > > > -- Ivan > > > > -- > > You received this message because you are subscribed to the Google Groups > > "foreman-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to foreman-dev+unsubscribe@googlegroups.com. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > it all looks pretty promising... one question i am not sure about > if an action fails today with foreman, all tasks are rolled back and the > user is back in the edit screen, where he can see the task status decided > if he needs to change something and resubmit, somethings its an error (e.g. > proxy was done, user can start it again and submit), or a conflict (e.g. > dns ptr record left over, and the user can decide he wants to clean up the > old record etc. > > how would we handle that? that would mean a whole new way of mitigating > failures right? > > thanks! > Ohad > > -- > You received this message because you are subscribed to the Google Groups > "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > >