Foreman Tasks: Scheduling options

Hey guys,

Looking into the Foreman Tasks and wondering what the scheduling options
might be once this comes to fruition. Am keen to see an avoidance of the
restrictive scheduling options that hampered Spacewalk.

Is there anything planned yet or anything I can read up on?

Cheers,

Duncan

Hi,

I'm not aware of the priority for this feature but yes, this is definitely in scope of the foreman-tasks: most of the primitives and infrastructure
are already in place and should not be hard to finish the rest.

However, AFAIK there are no wireframes of whitepapers on how it should look like, so we can start the discussion here:
what are the things that you missed/found hard to use in Spacewalk that we might address?

Thinking about that originally, for given resource (host/repository etc.), I would have a list of operations I can
schedule and I could create a schedule for the task: do every hour/week/month at some time active from some time (possibly using
the cron format if preferred).

– Ivan

··· ----- Original Message -----

Hey guys,

Looking into the Foreman Tasks and wondering what the scheduling options
might be once this comes to fruition. Am keen to see an avoidance of the
restrictive scheduling options that hampered Spacewalk.

Is there anything planned yet or anything I can read up on?

Cheers,

Duncan


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com .
For more options, visit https://groups.google.com/d/optout .

Ivan,

Some of the biggest issues I had were around using OSAD to enable
"push-to-client". Whilst this allowed an immediate push for urgent tasks,
it meant that every task was picked up as soon as possible after the
scheduled time. For large estates, this is both a bonus for running
important tasks, but also a problem if running remote commands that might
generate a lot of activity across a large virtualised estate. If the ratio
of virt guests to virt hosts is high, then a simultaneous task across the
whole network could generate an unwelcome spike in physical host load.
Will there be the ability to run arbitrary remote commands?

I had raised tickets with Spacewalk previously to look at improving the
flexibility to allow the following kinds of options:

  1. Task is scheduled to run after a certain date/time (defaulted to
    $localtime)
  2. Choose whether to run this task via OSAD (immediate) or RHNSD (picked up
    at next regular rhnsd check-in)
  3. Set a jitter value (in minutes?) to spread out the hit for large numbers
    of clients running simultaneously

An RHNSD option is probably no longer viable under the new infrastructure,
so perhaps more important to be able to set up a global default for the
jitter value. e.g. 30 mins to spread out the hit across the estate.
Foreman user could then override the default jitter setting to run
emergency jobs immediately.

Real world examples:

Emergency job: scripts that had to be run to check exposure to the
heartbleed problem. Security need an immediate answer.
Regular job - scripts that check whether all my clients have a failover
option for their IPA/IdM configuration. I just want the answer at some
point over the next hour

Not sure how the jitter would be implemented, but initial thoughts were to
be on the Spacewalk/Foreman side. If clients picked up a task, started
sleeping for the 4 hours due to the jitter, but were rebooted/crashed
before execution of the task, not sure how reliable the restart of the task
would be.

I'd not considered repeatable tasks (cron like as you say) - I'll have to
have a think about this. Not immediately sure what I'd be scheduling from
Foreman that I'd be repeating. Although it certainly sounds interesting.

Moving on to the user viewing the overall task schedule, the status of
"succeeded" and "failed" became extremely useful in filtering servers based
on the results of the task. What was missing in Spacewalk was the ability
to take all the servers that succeeded or failed a task and use them in the
System Set Manager. Manually selectable, I know, but on large estates you
might be running a task across a number of servers that make it impractical
to select all the failures manually. Not even sure if there's a concept of
SSM in Foreman yet. I know there's hostgroups etc, but these are more
predefined than the arbitrary selection of systems allowed by SSM. This is
less to do with the scheduling of tasks and more to do with the results
though.

Thoughts? Have I lost the plot?

Cheers,

Duncan

> Ivan,
>
> Some of the biggest issues I had were around using OSAD to enable
> "push-to-client". Whilst this allowed an immediate push for urgent tasks,
> it meant that every task was picked up as soon as possible after the
> scheduled time. For large estates, this is both a bonus for running
> important tasks, but also a problem if running remote commands that might
> generate a lot of activity across a large virtualised estate. If the ratio
> of virt guests to virt hosts is high, then a simultaneous task across the
> whole network could generate an unwelcome spike in physical host load.
> Will there be the ability to run arbitrary remote commands?

Yes, it will. For now, Katello uses pulp remote execution mechanism to perform
external calls. In the future, mcollective should do the stuff instead.
Anyway, for both mechanisms the plan is to use foreman-tasks (the first one
already is using it)

>
> I had raised tickets with Spacewalk previously to look at improving the
> flexibility to allow the following kinds of options:
>
> 1. Task is scheduled to run after a certain date/time (defaulted to
> $localtime)
> 2. Choose whether to run this task via OSAD (immediate) or RHNSD (picked up
> at next regular rhnsd check-in)
> 3. Set a jitter value (in minutes?) to spread out the hit for large numbers
> of clients running simultaneously
>
> An RHNSD option is probably no longer viable under the new infrastructure,
> so perhaps more important to be able to set up a global default for the
> jitter value. e.g. 30 mins to spread out the hit across the estate.
> Foreman user could then override the default jitter setting to run
> emergency jobs immediately.
>
> Real world examples:
>
> Emergency job: scripts that had to be run to check exposure to the
> heartbleed problem. Security need an immediate answer.
> Regular job - scripts that check whether all my clients have a failover
> option for their IPA/IdM configuration. I just want the answer at some
> point over the next hour
>
> Not sure how the jitter would be implemented, but initial thoughts were to
> be on the Spacewalk/Foreman side. If clients picked up a task, started
> sleeping for the 4 hours due to the jitter, but were rebooted/crashed
> before execution of the task, not sure how reliable the restart of the task
> would be.

There is one other option that might be quite easy to do as well: such as some variant
of a token bucket http://en.wikipedia.org/wiki/Token_bucket, where the remote
action would need to get the token from the bucket before it starts
(the advantage over the jitter is that you will always start quite soon. Let's
take a corner case of 1 task, with 30 mins jitter, it might start in 29 minutes
even when there is no other task running).

Anyway, there are more possible variants on distributing the tokens and it seems
to me as quite elastic mechanism, where different approaches might be taken
depending on the given case: some situation might require to guarantee the time
something is applied accross the infrastructure, while some other time, the time
is not that important and one cares more about the number of concurrent tasks.
The random jitter is just giving the task a token at random interval from now to
the $max.

>
> I'd not considered repeatable tasks (cron like as you say) - I'll have to
> have a think about this. Not immediately sure what I'd be scheduling from
> Foreman that I'd be repeating. Although it certainly sounds interesting.

The scheduled tasks is more about things like syncing repository every night/week
and stuff like that. Also, running the puppet related rake tasks via cron right
now might get better with this, as right now, every rake command might take
2 minutes until it loads the Rails environment, which doesn't scale really well.
Also, having common mechanism for this gives us one place to have an overview
what's happening in the system, especially with ability to filter the tasks
(as you write below vvvv)

>
> Moving on to the user viewing the overall task schedule, the status of
> "succeeded" and "failed" became extremely useful in filtering servers based
> on the results of the task. What was missing in Spacewalk was the ability
> to take all the servers that succeeded or failed a task and use them in the
> System Set Manager. Manually selectable, I know, but on large estates you
> might be running a task across a number of servers that make it impractical
> to select all the failures manually. Not even sure if there's a concept of
> SSM in Foreman yet. I know there's hostgroups etc, but these are more
> predefined than the arbitrary selection of systems allowed by SSM. This is
> less to do with the scheduling of tasks and more to do with the results
> though.

They are not in Foreman, but the concept is in Katello (content host collections
it would probably make sense to get this feature into the foreman core sooner or later).

>
> Thoughts? Have I lost the plot?

Thanks for the inputs (keep them coming).

··· ----- Original Message -----

Cheers,

Duncan


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

>
>
>
> > Ivan,
> >
> > Some of the biggest issues I had were around using OSAD to enable
> > "push-to-client". Whilst this allowed an immediate push for urgent
> tasks,
> > it meant that every task was picked up as soon as possible after the
> > scheduled time. For large estates, this is both a bonus for running
> > important tasks, but also a problem if running remote commands that
> might
> > generate a lot of activity across a large virtualised estate. If the
> ratio
> > of virt guests to virt hosts is high, then a simultaneous task across
> the
> > whole network could generate an unwelcome spike in physical host load.
> > Will there be the ability to run arbitrary remote commands?
>
> Yes, it will. For now, Katello uses pulp remote execution mechanism to
> perform
> external calls. In the future, mcollective should do the stuff instead.
> Anyway, for both mechanisms the plan is to use foreman-tasks (the first
> one
> already is using it)
>
> >
> > I had raised tickets with Spacewalk previously to look at improving the
> > flexibility to allow the following kinds of options:
> >
> > 1. Task is scheduled to run after a certain date/time (defaulted to
> > $localtime)
> > 2. Choose whether to run this task via OSAD (immediate) or RHNSD (picked
> up
> > at next regular rhnsd check-in)
> > 3. Set a jitter value (in minutes?) to spread out the hit for large
> numbers
> > of clients running simultaneously
> >
> > An RHNSD option is probably no longer viable under the new
> infrastructure,
> > so perhaps more important to be able to set up a global default for the
> > jitter value. e.g. 30 mins to spread out the hit across the estate.
> > Foreman user could then override the default jitter setting to run
> > emergency jobs immediately.
> >
> > Real world examples:
> >
> > Emergency job: scripts that had to be run to check exposure to the
> > heartbleed problem. Security need an immediate answer.
> > Regular job - scripts that check whether all my clients have a failover
> > option for their IPA/IdM configuration. I just want the answer at some
> > point over the next hour
> >
> > Not sure how the jitter would be implemented, but initial thoughts were
> to
> > be on the Spacewalk/Foreman side. If clients picked up a task, started
> > sleeping for the 4 hours due to the jitter, but were rebooted/crashed
> > before execution of the task, not sure how reliable the restart of the
> task
> > would be.
>
> There is one other option that might be quite easy to do as well: such as
> some variant
> of a token bucket http://en.wikipedia.org/wiki/Token_bucket, where the
> remote
> action would need to get the token from the bucket before it starts
> (the advantage over the jitter is that you will always start quite soon.
> Let's
> take a corner case of 1 task, with 30 mins jitter, it might start in 29
> minutes
> even when there is no other task running).
>

Fair point. I would exclude the jitter from being implemented for tasks
being scheduled on a single host - but then where do you draw the line? 2
hosts with 60 min jitter - both could wait 58 & 59 mins. Perhaps a simple
algorithm that includes the number of hosts in the collection to calculate
a guidance jitter for each task?

It's really only for heavy tasks running across large groups that spreading
the load is necessary. Most methods have edge cases at some point.

As long as there's some method for spreading out a load, it doesn't really
matter which method is used. As long as there's a choice between
"immediate" and "spread". I would figure that the majority of tasks will
be scheduled to run as soon as possible. It's probably only where estates
get into the tens of thousands where Admins need to start being careful
about avoiding load spikes.

Anyway, there are more possible variants on distributing the tokens and it
> seems
> to me as quite elastic mechanism, where different approaches might be
> taken
> depending on the given case: some situation might require to guarantee the
> time
> something is applied accross the infrastructure, while some other time,
> the time
> is not that important and one cares more about the number of concurrent
> tasks.
> The random jitter is just giving the task a token at random interval from
> now to
> the $max.
>

Yes - the actual method is less important than the fact that there is a
choice. If more than one method can be provided - better for those that
understand, but potentially confusing for those that don't.

··· On Thursday, 19 June 2014 16:58:55 UTC+1, Ivan Necas wrote: > ----- Original Message -----

I’d not considered repeatable tasks (cron like as you say) - I’ll have
to
have a think about this. Not immediately sure what I’d be scheduling
from
Foreman that I’d be repeating. Although it certainly sounds
interesting.

The scheduled tasks is more about things like syncing repository every
night/week
and stuff like that. Also, running the puppet related rake tasks via cron
right
now might get better with this, as right now, every rake command might
take
2 minutes until it loads the Rails environment, which doesn’t scale really
well.
Also, having common mechanism for this gives us one place to have an
overview
what’s happening in the system, especially with ability to filter the
tasks
(as you write below vvvv)

Moving on to the user viewing the overall task schedule, the status of
"succeeded" and “failed” became extremely useful in filtering servers
based
on the results of the task. What was missing in Spacewalk was the
ability
to take all the servers that succeeded or failed a task and use them in
the
System Set Manager. Manually selectable, I know, but on large estates
you
might be running a task across a number of servers that make it
impractical
to select all the failures manually. Not even sure if there’s a concept
of
SSM in Foreman yet. I know there’s hostgroups etc, but these are more
predefined than the arbitrary selection of systems allowed by SSM. This
is
less to do with the scheduling of tasks and more to do with the results
though.

They are not in Foreman, but the concept is in Katello (content host
collections
it would probably make sense to get this feature into the foreman core
sooner or later).

Thoughts? Have I lost the plot?

Thanks for the inputs (keep them coming).

Cheers,

Duncan