Orchestration NG and bulk actions demo write-up

Hello,

TL;DR:
This is a write-up for today's demo on orchestration-NG (calls to Pulp/Candlepin/Smart proxies)
and bulk actions using a library called Dynflow. You can find a link to the
demo of current status and code, as well as some explanations and furthers
steps of stabilizing back-end code in Katelo and Foreman.

Intro

··· -----

We started this effort some time ago to stabilize the way
how Katello calls to underlying backends (Pulp, Candlepin…).
The goal is to improve the back-end code to be able to have more
control over the whole process of propagation of changes between Katello
and theses services, including:

  • seeing the progress as it happens
  • being able to resume the process when something goes wrong
  • being able to extend the process from third-party libraries (a.k.a.
    engines)
  • being able to run the steps concurrently if needed (only part of the
    steps are really dependent on each other, the rest can be preformed
    concurrently: for tasks such as calling external services via HTTP
    or messaging, the performance should gain a lot with this.
  • leveraging this effort to handle bulk actions as well, as the
    requirements are really similar here

We try to do it in a way that keeps the “business” logic as
independent from implementation detail how the process is run at the
end, and many things can be decided based on the use case, such as:

  • if the execution happens in the same process as the web server, if
    there is a thread per workflow, or there is a thread-pool of workers
    handling the separate steps of workflow. The architecture is done in
    a way that it makes it possible to distribute the flow over
    messaging etc (although we don’t have that implemented yet)
  • where the process is serialized (memory, file, database …)
  • etc.

The work was done so far especially on back-end side, that can
be used in UI/API to provide some nicer interface. The user
interface that is provided right now is intended more fore developers/
super-admins.

I’m sending this also to Foreman list, as the orchestration on
Foreman side is done similar to the Katello’s one and the demo
show’s the usage of Dynflow in Foreman as well.

Current status

I’ve demoed the current status here:

http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104

In this hangout, I’m going though these use-cases:

  • end2end scenario from organization creation to content view
    definition publishing
  • concurrent execution of independent steps of process.
  • ability to resume the process on failure
  • usage in Foreman to perform the smart-proxy calls to create DHCP and
    DNS records, being able to chain this actions (DNS action uses the IP that
    is was produced in the DHCP action) (including ability to resume
    after error)
  • ability to use this approach to make the process extendable from
    third-party code.
  • ability to use it for bulk actions

As it’s a live demo, you will also have a chance to see /me debugging
provisioning issues, that was not planned as part of this recording:)

For those intersted in the code, here it is:

  • Katello with part of ochestration rewritten using Dynflow:

    https://github.com/iNecas/katello/tree/orchestration-ng

    bundle install and rake db:migrate (to prepare the database
    storage for process serialization), one can use end2end.sh new_org_name to perform the actions that are using the new
    orchestration tool.

    We have ignored the output errors for now, as we focused on proper
    calls to underlying services. I’ve also turned off the automatic elastic
    search indexing as indexing should happen after the orchestration is
    done (and in the parts already rewritten, it’s really done as part
    of orchestration as finalization step)

    To control the orchestration, there is a web console available on
    /katello/dynflow: it allows to see the current status of
    orchestration as well as ability to resume process or skip some
    steps when failing

  • the Foreman code with some use of Dynflow is available here:

    https://github.com/iNecas/foreman/tree/orchestration-ng

    bundle install should be enough here, as we use file storage for
    processes serialization (just to show he have it:)

    I’ve modified the host creation code to use Dynflow for DHCP/DNS
    related calls to foreman-proxy. I’ve ignored things like collisions
    control for now.

    Also it’s loads a sample extension app:

    https://github.com/iNecas/foreman-ext

    that uses Dynflow to hook into the main Foreman process to perform
    some additional tasks (logging messages about actions to
    log/actions.log). It hooks to host and domain creation process.

  • the Dynflow code used for the demo is here:

    https://github.com/iNecas/dynflow/tree/kafo

    It’s based on this branch witch some tweaks that were not pushed
    into upstream yet.

    https://github.com/iNecas/dynflow/tree/refactor

    The refactor branch contains the code cleanup that we did after
    having first quick-and-dirty prototype some time ago and will be
    merged into master after some more polishing (see next steps)

Next Steps

We have still bunch of things to do:

  • better README and documentation for Dynflow

  • meta information for the workflows to be able to filter and sort by result,
    state, user, task type etc (ideally extendable enough for developers
    to specify their own criteria)

  • ability to suspend execution of some steps when waiting for external
    response (via messaging response or polling on the status of
    external task). For now for example, when synchronizing the
    repository in Pulp, the polling on sync status happens in the worker
    itself, which blocks a worker thread in the pool: we want to extract
    this different thread that would handle the polling in general and
    resume the step if something to do occurs. Similarly for messaging.

  • ability to distribute the execution though messaging (depends on
    previous point)

  • API for specifying rollbacks for some actions: if a workflow has all
    actions with rollbacks defined, it should be possible to rollback
    the whole flow.

  • Rest API/CLI for the dynflow to be able to control and automate some
    operations

and more based on the feedback.

On Katello/Foreman side, there should be a logic of “soft locks”
implemented: a resource can’t be fully used until the task that
performs it’s orchestration finishes and user actions should be
prevented till everything is propagated: today there isn’t demand
for this as the resource doesn’t get into database till everything is
done.

I’m sorry for a long mail, I’ve tried not to get into verbose mode :slight_smile: And
I’m pretty sure I forgot about something.

Any feedback appreciated. I’m also more than happy to provide more explanation
and background that lead into current implementation.

Special thanks to pitr-ch that helped at lot with refactoring and
parallel executor, as well as jsherrill and ehelms that provided
helpful feed-back and patches.

– Ivan

if an action fails today with foreman, all tasks are rolled back and the
user is back in the edit screen, where he can see the task status decided
if he needs to change something and resubmit, somethings its an error (e.g.
proxy was done, user can start it again and submit), or a conflict (e.g.
dns ptr record left over, and the user can decide he wants to clean up the
old record etc.

how would we handle that? that would mean a whole new way of mitigating
failures right?

thanks!
Ohad

··· On Mon, Aug 5, 2013 at 7:39 PM, Ivan Necas wrote:

Hello,

TL;DR:
This is a write-up for today’s demo on orchestration-NG (calls to
Pulp/Candlepin/Smart proxies)
and bulk actions using a library called Dynflow. You can find a link to the
demo of current status and code, as well as some explanations and furthers
steps of stabilizing back-end code in Katelo and Foreman.

Intro

We started this effort some time ago to stabilize the way
how Katello calls to underlying backends (Pulp, Candlepin…).
The goal is to improve the back-end code to be able to have more
control over the whole process of propagation of changes between Katello
and theses services, including:

  • seeing the progress as it happens
  • being able to resume the process when something goes wrong
  • being able to extend the process from third-party libraries (a.k.a.
    engines)
  • being able to run the steps concurrently if needed (only part of the
    steps are really dependent on each other, the rest can be preformed
    concurrently: for tasks such as calling external services via HTTP
    or messaging, the performance should gain a lot with this.
  • leveraging this effort to handle bulk actions as well, as the
    requirements are really similar here

We try to do it in a way that keeps the “business” logic as
independent from implementation detail how the process is run at the
end, and many things can be decided based on the use case, such as:

  • if the execution happens in the same process as the web server, if
    there is a thread per workflow, or there is a thread-pool of workers
    handling the separate steps of workflow. The architecture is done in
    a way that it makes it possible to distribute the flow over
    messaging etc (although we don’t have that implemented yet)
  • where the process is serialized (memory, file, database …)
  • etc.

The work was done so far especially on back-end side, that can
be used in UI/API to provide some nicer interface. The user
interface that is provided right now is intended more fore developers/
super-admins.

I’m sending this also to Foreman list, as the orchestration on
Foreman side is done similar to the Katello’s one and the demo
show’s the usage of Dynflow in Foreman as well.

Current status

I’ve demoed the current status here:

http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104

In this hangout, I’m going though these use-cases:

  • end2end scenario from organization creation to content view
    definition publishing
  • concurrent execution of independent steps of process.
  • ability to resume the process on failure
  • usage in Foreman to perform the smart-proxy calls to create DHCP and
    DNS records, being able to chain this actions (DNS action uses the IP
    that
    is was produced in the DHCP action) (including ability to resume
    after error)
  • ability to use this approach to make the process extendable from
    third-party code.
  • ability to use it for bulk actions

As it’s a live demo, you will also have a chance to see /me debugging
provisioning issues, that was not planned as part of this recording:)

For those intersted in the code, here it is:

  • Katello with part of ochestration rewritten using Dynflow:

    https://github.com/iNecas/katello/tree/orchestration-ng

    bundle install and rake db:migrate (to prepare the database
    storage for process serialization), one can use end2end.sh new_org_name to perform the actions that are using the new
    orchestration tool.

    We have ignored the output errors for now, as we focused on proper
    calls to underlying services. I’ve also turned off the automatic elastic
    search indexing as indexing should happen after the orchestration is
    done (and in the parts already rewritten, it’s really done as part
    of orchestration as finalization step)

    To control the orchestration, there is a web console available on
    /katello/dynflow: it allows to see the current status of
    orchestration as well as ability to resume process or skip some
    steps when failing

  • the Foreman code with some use of Dynflow is available here:

    https://github.com/iNecas/foreman/tree/orchestration-ng

    bundle install should be enough here, as we use file storage for
    processes serialization (just to show he have it:)

    I’ve modified the host creation code to use Dynflow for DHCP/DNS
    related calls to foreman-proxy. I’ve ignored things like collisions
    control for now.

    Also it’s loads a sample extension app:

    https://github.com/iNecas/foreman-ext

    that uses Dynflow to hook into the main Foreman process to perform
    some additional tasks (logging messages about actions to
    log/actions.log). It hooks to host and domain creation process.

  • the Dynflow code used for the demo is here:

    https://github.com/iNecas/dynflow/tree/kafo

    It’s based on this branch witch some tweaks that were not pushed
    into upstream yet.

    https://github.com/iNecas/dynflow/tree/refactor

    The refactor branch contains the code cleanup that we did after
    having first quick-and-dirty prototype some time ago and will be
    merged into master after some more polishing (see next steps)

Next Steps

We have still bunch of things to do:

  • better README and documentation for Dynflow

  • meta information for the workflows to be able to filter and sort by
    result,
    state, user, task type etc (ideally extendable enough for developers
    to specify their own criteria)

  • ability to suspend execution of some steps when waiting for external
    response (via messaging response or polling on the status of
    external task). For now for example, when synchronizing the
    repository in Pulp, the polling on sync status happens in the worker
    itself, which blocks a worker thread in the pool: we want to extract
    this different thread that would handle the polling in general and
    resume the step if something to do occurs. Similarly for messaging.

  • ability to distribute the execution though messaging (depends on
    previous point)

  • API for specifying rollbacks for some actions: if a workflow has all
    actions with rollbacks defined, it should be possible to rollback
    the whole flow.

  • Rest API/CLI for the dynflow to be able to control and automate some
    operations

and more based on the feedback.

On Katello/Foreman side, there should be a logic of "soft locks"
implemented: a resource can’t be fully used until the task that
performs it’s orchestration finishes and user actions should be
prevented till everything is propagated: today there isn’t demand
for this as the resource doesn’t get into database till everything is
done.

I’m sorry for a long mail, I’ve tried not to get into verbose mode :slight_smile: And
I’m pretty sure I forgot about something.

Any feedback appreciated. I’m also more than happy to provide more
explanation
and background that lead into current implementation.

Special thanks to pitr-ch that helped at lot with refactoring and
parallel executor, as well as jsherrill and ehelms that provided
helpful feed-back and patches.

– Ivan


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

it all looks pretty promising… one question i am not sure about

There are more options to deal with the failure with this approach:

  1. by default the orchestration process is paused, letting you to fix the issue (expecting the problem was caused by some external service unavailable or something) and resume the process (it also survives restarts etc).

  2. The second option is to define the rollbacks for the orchestration actions, and if all the steps of the orestration have rollbacks defined, the rollback is doable (it's still questionable if it's not better to go for 1. by default letting the user to choose if it's better to rollback or try to fix the problem (or having the admin to decide)

– Ivan

··· ----- Original Message ----- > On Mon, Aug 5, 2013 at 7:39 PM, Ivan Necas wrote: > > > Hello, > > > > TL;DR: > > This is a write-up for today's demo on orchestration-NG (calls to > > Pulp/Candlepin/Smart proxies) > > and bulk actions using a library called Dynflow. You can find a link to the > > demo of current status and code, as well as some explanations and furthers > > steps of stabilizing back-end code in Katelo and Foreman. > > > > Intro > > ----- > > > > We started this effort some time ago to stabilize the way > > how Katello calls to underlying backends (Pulp, Candlepin...). > > The goal is to improve the back-end code to be able to have more > > control over the whole process of propagation of changes between Katello > > and theses services, including: > > > > * seeing the progress as it happens > > * being able to resume the process when something goes wrong > > * being able to extend the process from third-party libraries (a.k.a. > > engines) > > * being able to run the steps concurrently if needed (only part of the > > steps are really dependent on each other, the rest can be preformed > > concurrently: for tasks such as calling external services via HTTP > > or messaging, the performance should gain a lot with this. > > * leveraging this effort to handle bulk actions as well, as the > > requirements are really similar here > > > > We try to do it in a way that keeps the "business" logic as > > independent from implementation detail how the process is run at the > > end, and many things can be decided based on the use case, such as: > > > > * if the execution happens in the same process as the web server, if > > there is a thread per workflow, or there is a thread-pool of workers > > handling the separate steps of workflow. The architecture is done in > > a way that it makes it possible to distribute the flow over > > messaging etc (although we don't have that implemented yet) > > * where the process is serialized (memory, file, database ...) > > * etc. > > > > The work was done so far especially on back-end side, that can > > be used in UI/API to provide some nicer interface. The user > > interface that is provided right now is intended more fore developers/ > > super-admins. > > > > I'm sending this also to Foreman list, as the orchestration on > > Foreman side is done similar to the Katello's one and the demo > > show's the usage of Dynflow in Foreman as well. > > > > Current status > > -------------- > > > > I've demoed the current status here: > > > > > > http://www.youtube.com/watch?feature=player_detailpage&v=-hBef6LRBvY&t=104 > > > > In this hangout, I'm going though these use-cases: > > > > * end2end scenario from organization creation to content view > > definition publishing > > * concurrent execution of independent steps of process. > > * ability to resume the process on failure > > * usage in Foreman to perform the smart-proxy calls to create DHCP and > > DNS records, being able to chain this actions (DNS action uses the IP > > that > > is was produced in the DHCP action) (including ability to resume > > after error) > > * ability to use this approach to make the process extendable from > > third-party code. > > * ability to use it for bulk actions > > > > As it's a live demo, you will also have a chance to see /me debugging > > provisioning issues, that was not planned as part of this recording:) > > > > For those intersted in the code, here it is: > > > > * Katello with part of ochestration rewritten using Dynflow: > > > > https://github.com/iNecas/katello/tree/orchestration-ng > > > > `bundle install` and `rake db:migrate` (to prepare the database > > storage for process serialization), one can use `end2end.sh > > new_org_name` to perform the actions that are using the new > > orchestration tool. > > > > We have ignored the output errors for now, as we focused on proper > > calls to underlying services. I've also turned off the automatic elastic > > search indexing as indexing should happen after the orchestration is > > done (and in the parts already rewritten, it's really done as part > > of orchestration as finalization step) > > > > To control the orchestration, there is a web console available on > > `/katello/dynflow`: it allows to see the current status of > > orchestration as well as ability to resume process or skip some > > steps when failing > > > > * the Foreman code with some use of Dynflow is available here: > > > > https://github.com/iNecas/foreman/tree/orchestration-ng > > > > `bundle install` should be enough here, as we use file storage for > > processes serialization (just to show he have it:) > > > > I've modified the host creation code to use Dynflow for DHCP/DNS > > related calls to foreman-proxy. I've ignored things like collisions > > control for now. > > > > Also it's loads a sample extension app: > > > > https://github.com/iNecas/foreman-ext > > > > that uses Dynflow to hook into the main Foreman process to perform > > some additional tasks (logging messages about actions to > > `log/actions.log`). It hooks to host and domain creation process. > > > > * the Dynflow code used for the demo is here: > > > > https://github.com/iNecas/dynflow/tree/kafo > > > > It's based on this branch witch some tweaks that were not pushed > > into upstream yet. > > > > https://github.com/iNecas/dynflow/tree/refactor > > > > The refactor branch contains the code cleanup that we did after > > having first quick-and-dirty prototype some time ago and will be > > merged into master after some more polishing (see next steps) > > > > Next Steps > > ---------- > > > > We have still bunch of things to do: > > > > * better README and documentation for Dynflow > > > > * meta information for the workflows to be able to filter and sort by > > result, > > state, user, task type etc (ideally extendable enough for developers > > to specify their own criteria) > > > > * ability to suspend execution of some steps when waiting for external > > response (via messaging response or polling on the status of > > external task). For now for example, when synchronizing the > > repository in Pulp, the polling on sync status happens in the worker > > itself, which blocks a worker thread in the pool: we want to extract > > this different thread that would handle the polling in general and > > resume the step if something to do occurs. Similarly for messaging. > > > > * ability to distribute the execution though messaging (depends on > > previous point) > > > > * API for specifying rollbacks for some actions: if a workflow has all > > actions with rollbacks defined, it should be possible to rollback > > the whole flow. > > > > * Rest API/CLI for the dynflow to be able to control and automate some > > operations > > > > and more based on the feedback. > > > > On Katello/Foreman side, there should be a logic of "soft locks" > > implemented: a resource can't be fully used until the task that > > performs it's orchestration finishes and user actions should be > > prevented till everything is propagated: today there isn't demand > > for this as the resource doesn't get into database till everything is > > done. > > > > I'm sorry for a long mail, I've tried not to get into verbose mode :) And > > I'm pretty sure I forgot about something. > > > > Any feedback appreciated. I'm also more than happy to provide more > > explanation > > and background that lead into current implementation. > > > > Special thanks to pitr-ch that helped at lot with refactoring and > > parallel executor, as well as jsherrill and ehelms that provided > > helpful feed-back and patches. > > > > -- Ivan > > > > -- > > You received this message because you are subscribed to the Google Groups > > "foreman-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to foreman-dev+unsubscribe@googlegroups.com. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > it all looks pretty promising... one question i am not sure about > if an action fails today with foreman, all tasks are rolled back and the > user is back in the edit screen, where he can see the task status decided > if he needs to change something and resubmit, somethings its an error (e.g. > proxy was done, user can start it again and submit), or a conflict (e.g. > dns ptr record left over, and the user can decide he wants to clean up the > old record etc. > > how would we handle that? that would mean a whole new way of mitigating > failures right? > > thanks! > Ohad > > -- > You received this message because you are subscribed to the Google Groups > "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > >