Foreman scaling and performance

Simon_Murray · March 28, 2014, 2:53pm

Hey guys, I've seen some cases where foreman is incredibly slow, just want
to run this past you to ensure I've not messed up.

So the setup. Fairy small deployment of 20 servers (for now), but that's
not the scaling I'm worried about. Code commits to our puppet repo clone
into their own environments for testing purposes, then go up to github for
off-site backup and peer review. Not ground breaking so far. Final bit is
when a review is completed a script pulls into our production environment
and then calls /api/smart_proxy/:id/update_puppetclasses to refresh things.

Now I simulated going from 100 to 2000 puppet classes in a VM and saw
puppet class updates rising pretty linearly from a few seconds up to about
20 which isn't terrible. The real fun began when we stared scaling up the
number of dynamic environments in existence, to 20 odd iirc, then this
update process was taking a good 15 minutes. Kinda struck me as a worse
than O(N^2) problem going on here!

Anecdotally on real hardware with all the joys of network latency and
ancient equipment, this was taking up 30 minutes a pop. That took serious
time to clear out the production roll-out queue It's down to a couple
minutes now after deleting all the old feature branch environments (yay!)
and ensuring our scripts pro-actively clear up rather than relying on devs
to do it, but hardly my most ethical of solutions. As an aside we were
also suffering rubbish interactive performance, getting loads of spam about
the ENC not getting a node definition etc etc, these requests taking
anything upward of a minute.

Just wondering if others have experienced the same behaviour with large
numbers of puppet classes multiplied by large numbers of dynamic
environments. Also if there is something that springs to mind as regards
horrific bottlenecks by all means poke me in the right direction and I can
donate some cycles to improving this area.

Cheers

···

-- DataCentred Limited registered in England and Wales no. 05611763

dLobatog · March 31, 2014, 11:32pm

The ENC problem is solvable by scaling horizontally and it's not such a big
deal (fortunately for us). Reports take a long time to process in large
installations too, if you haven't noticed Those two problems are easy to
solve by throwing them some hardware.
However puppet classes update is indeed a painful, really slow process in
even low-mid size deployments. I'm not aware of any work being done
currently to solve that, but we should focus into solving that if not for
1.5, for 1.6.

I don't think we can offer a short-term solution for this problem, but if
your puppet classes don't change that often, we have a rake task 'rake
import:puppet:environments_only' that allows you to update your
environments list fast instead of waiting for the whole puppet classes
update process.

···

On Friday, March 28, 2014 3:53:36 PM UTC+1, Simon Murray wrote: > > Hey guys, I've seen some cases where foreman is incredibly slow, just want > to run this past you to ensure I've not messed up. > > So the setup. Fairy small deployment of 20 servers (for now), but that's > not the scaling I'm worried about. Code commits to our puppet repo clone > into their own environments for testing purposes, then go up to github for > off-site backup and peer review. Not ground breaking so far. Final bit is > when a review is completed a script pulls into our production environment > and then calls /api/smart_proxy/:id/update_puppetclasses to refresh things. > > Now I simulated going from 100 to 2000 puppet classes in a VM and saw > puppet class updates rising pretty linearly from a few seconds up to about > 20 which isn't terrible. The real fun began when we stared scaling up the > number of dynamic environments in existence, to 20 odd iirc, then this > update process was taking a good 15 minutes. Kinda struck me as a worse > than O(N^2) problem going on here! > > Anecdotally on real hardware with all the joys of network latency and > ancient equipment, this was taking up 30 minutes a pop. That took serious > time to clear out the production roll-out queue :) It's down to a couple > minutes now after deleting all the old feature branch environments (yay!) > and ensuring our scripts pro-actively clear up rather than relying on devs > to do it, but hardly my most ethical of solutions. As an aside we were > also suffering rubbish interactive performance, getting loads of spam about > the ENC not getting a node definition etc etc, these requests taking > anything upward of a minute. > > Just wondering if others have experienced the same behaviour with large > numbers of puppet classes multiplied by large numbers of dynamic > environments. Also if there is something that springs to mind as regards > horrific bottlenecks by all means poke me in the right direction and I can > donate some cycles to improving this area. > > Cheers > > DataCentred Limited registered in England and Wales no. 05611763

Kal_Aeolian · April 1, 2014, 11:26pm

If you have dozens of classes and many environments, the import puppet task
specifically can take a LONG time, many minutes or more. And you'd better
hope none of your updated classes are for pre-existing classes with
overrides as it doesn't update the class if there is an override, and it
doesn't fail, long outstanding bug.

Horizontal scaling won't help this issue. We just try to minimize the
amount of importing that has to be done (and disabled the check in the code
that stops the update from happening if overrides are set).

···

On Friday, March 28, 2014 7:53:36 AM UTC-7, Simon Murray wrote: > > Hey guys, I've seen some cases where foreman is incredibly slow, just want > to run this past you to ensure I've not messed up. > > So the setup. Fairy small deployment of 20 servers (for now), but that's > not the scaling I'm worried about. Code commits to our puppet repo clone > into their own environments for testing purposes, then go up to github for > off-site backup and peer review. Not ground breaking so far. Final bit is > when a review is completed a script pulls into our production environment > and then calls /api/smart_proxy/:id/update_puppetclasses to refresh things. > > Now I simulated going from 100 to 2000 puppet classes in a VM and saw > puppet class updates rising pretty linearly from a few seconds up to about > 20 which isn't terrible. The real fun began when we stared scaling up the > number of dynamic environments in existence, to 20 odd iirc, then this > update process was taking a good 15 minutes. Kinda struck me as a worse > than O(N^2) problem going on here! > > Anecdotally on real hardware with all the joys of network latency and > ancient equipment, this was taking up 30 minutes a pop. That took serious > time to clear out the production roll-out queue :) It's down to a couple > minutes now after deleting all the old feature branch environments (yay!) > and ensuring our scripts pro-actively clear up rather than relying on devs > to do it, but hardly my most ethical of solutions. As an aside we were > also suffering rubbish interactive performance, getting loads of spam about > the ENC not getting a node definition etc etc, these requests taking > anything upward of a minute. > > Just wondering if others have experienced the same behaviour with large > numbers of puppet classes multiplied by large numbers of dynamic > environments. Also if there is something that springs to mind as regards > horrific bottlenecks by all means poke me in the right direction and I can > donate some cycles to improving this area. > > Cheers > > DataCentred Limited registered in England and Wales no. 05611763

ohadlevy · April 2, 2014, 7:50am

> The ENC problem is solvable by scaling horizontally and it's not such a
> big deal (fortunately for us). Reports take a long time to process in large
> installations too, if you haven't noticed Those two problems are easy to
> solve by throwing them some hardware.
> However puppet classes update is indeed a painful, really slow process in
> even low-mid size deployments. I'm not aware of any work being done
> currently to solve that, but we should focus into solving that if not for
> 1.5, for 1.6.
>
> I don't think we can offer a short-term solution for this problem, but if
> your puppet classes don't change that often, we have a rake task 'rake
> import:puppet:environments_only' that allows you to update your
> environments list fast instead of waiting for the whole puppet classes
> update process.
>

You can also create a filter, to reduce the classes (and parameters) that
we import into foreman, see Ignored environments at
http://theforeman.org/manuals/1.4/index.html#4.2.2Classes

Ohad

···

On Tue, Apr 1, 2014 at 2:32 AM, Daniel Lobato wrote:

On Friday, March 28, 2014 3:53:36 PM UTC+1, Simon Murray wrote:

Hey guys, I’ve seen some cases where foreman is incredibly slow, just
want to run this past you to ensure I’ve not messed up.

So the setup. Fairy small deployment of 20 servers (for now), but that’s
not the scaling I’m worried about. Code commits to our puppet repo clone
into their own environments for testing purposes, then go up to github for
off-site backup and peer review. Not ground breaking so far. Final bit is
when a review is completed a script pulls into our production environment
and then calls /api/smart_proxy/:id/update_puppetclasses to refresh
things.

Now I simulated going from 100 to 2000 puppet classes in a VM and saw
puppet class updates rising pretty linearly from a few seconds up to about
20 which isn’t terrible. The real fun began when we stared scaling up the
number of dynamic environments in existence, to 20 odd iirc, then this
update process was taking a good 15 minutes. Kinda struck me as a worse
than O(N^2) problem going on here!

Anecdotally on real hardware with all the joys of network latency and
ancient equipment, this was taking up 30 minutes a pop. That took serious
time to clear out the production roll-out queue It’s down to a couple
minutes now after deleting all the old feature branch environments (yay!)
and ensuring our scripts pro-actively clear up rather than relying on devs
to do it, but hardly my most ethical of solutions. As an aside we were
also suffering rubbish interactive performance, getting loads of spam about
the ENC not getting a node definition etc etc, these requests taking
anything upward of a minute.

Just wondering if others have experienced the same behaviour with large
numbers of puppet classes multiplied by large numbers of dynamic
environments. Also if there is something that springs to mind as regards
horrific bottlenecks by all means poke me in the right direction and I can
donate some cycles to improving this area.

Cheers

DataCentred Limited registered in England and Wales no. 05611763

–
You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at http://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

Simon_Murray · April 2, 2014, 1:17pm

That is a very interesting idea to pursue. Until the point that puppet
supports access specifiers or linker like visibility I think I may try
sprinkling the code base with

#pragma extern

comments, then as part of the synchronisation process scan each class, post
the configuration to foreman before importing classes. I still think it's
pretending the bottleneck doesn't exist, especially given the relatively
small amount of data that is actually being churned here. For now I'll do
a quick prototype of this idea and let you know what happens

···

On Wednesday, 2 April 2014 08:50:36 UTC+1, ohad wrote: > > > > > On Tue, Apr 1, 2014 at 2:32 AM, Daniel Lobato <elob...@gmail.com > > wrote: > >> The ENC problem is solvable by scaling horizontally and it's not such a >> big deal (fortunately for us). Reports take a long time to process in large >> installations too, if you haven't noticed :) Those two problems are easy to >> solve by throwing them some hardware. >> However puppet classes update is indeed a painful, really slow process in >> even low-mid size deployments. I'm not aware of any work being done >> currently to solve that, but we should focus into solving that if not for >> 1.5, for 1.6. >> >> I don't think we can offer a short-term solution for this problem, but if >> your puppet classes don't change that often, we have a rake task 'rake >> import:puppet:environments_only' that allows you to update your >> environments list fast instead of waiting for the whole puppet classes >> update process. >> > > You can also create a filter, to reduce the classes (and parameters) that > we import into foreman, see Ignored environments at > http://theforeman.org/manuals/1.4/index.html#4.2.2Classes > > Ohad > >> >> >> On Friday, March 28, 2014 3:53:36 PM UTC+1, Simon Murray wrote: >>> >>> Hey guys, I've seen some cases where foreman is incredibly slow, just >>> want to run this past you to ensure I've not messed up. >>> >>> So the setup. Fairy small deployment of 20 servers (for now), but >>> that's not the scaling I'm worried about. Code commits to our puppet repo >>> clone into their own environments for testing purposes, then go up to >>> github for off-site backup and peer review. Not ground breaking so far. >>> Final bit is when a review is completed a script pulls into our production >>> environment and then calls /api/smart_proxy/:id/update_puppetclasses to >>> refresh things. >>> >>> Now I simulated going from 100 to 2000 puppet classes in a VM and saw >>> puppet class updates rising pretty linearly from a few seconds up to about >>> 20 which isn't terrible. The real fun began when we stared scaling up the >>> number of dynamic environments in existence, to 20 odd iirc, then this >>> update process was taking a good 15 minutes. Kinda struck me as a worse >>> than O(N^2) problem going on here! >>> >>> Anecdotally on real hardware with all the joys of network latency and >>> ancient equipment, this was taking up 30 minutes a pop. That took serious >>> time to clear out the production roll-out queue :) It's down to a couple >>> minutes now after deleting all the old feature branch environments (yay!) >>> and ensuring our scripts pro-actively clear up rather than relying on devs >>> to do it, but hardly my most ethical of solutions. As an aside we were >>> also suffering rubbish interactive performance, getting loads of spam about >>> the ENC not getting a node definition etc etc, these requests taking >>> anything upward of a minute. >>> >>> Just wondering if others have experienced the same behaviour with large >>> numbers of puppet classes multiplied by large numbers of dynamic >>> environments. Also if there is something that springs to mind as regards >>> horrific bottlenecks by all means poke me in the right direction and I can >>> donate some cycles to improving this area. >>> >>> Cheers >>> >>> DataCentred Limited registered in England and Wales no. 05611763 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Foreman users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-user...@googlegroups.com . >> To post to this group, send email to forema...@googlegroups.com >> . >> Visit this group at http://groups.google.com/group/foreman-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- DataCentred Limited registered in England and Wales no. 05611763