Foreman scalability -> 100k+ nodes anyone?

ilya_m · November 1, 2017, 10:25pm

Dear foreman users,

I've recently joined foreman users group.

Previously (4 years ago) i use to manage spacewalk - which was doing its
job well for 2k+ nodes - but i'd experience issues time to time with
stability + scalability. Also - postgres db left alot to be desired and
seemed a bit messy. This was 4 years ago - and i'm guessing alot has
changed since then - but it also seems like Spacewalk is on maintenance
mode and RedHat moved on to Foreman/Katello?

I'm now tasked with new patching/management solution for linux - and i'm
exploring foreman as one of the alternatives.

I'm curious how far foreman can scale and what services might be the
bottlenecks. Can i scale the bottleneck services?

My use case varies - but it will probably be 100k nodes in a year - and
upto 500k nodes in few years.

It would be ideal if i can run foreman on kubernets with persistent storage

is that a possibility?

With that said - what challenges have you experienced in large scale? what
services are usual suspects and what can be done to mitigate it?

I'm guessing there arent too many solutions that work well in that scale -
so i'm open to splitting it up in smaller environments based on business
groups. I'd then create an aggregator and routing engine - if need be.

Thank you
-ilya

sean797 · November 2, 2017, 1:53pm

> Dear foreman users,
>
> I've recently joined foreman users group.
>
Welcome!

>
> Previously (4 years ago) i use to manage spacewalk - which was doing its
> job well for 2k+ nodes - but i'd experience issues time to time with
> stability + scalability. Also - postgres db left alot to be desired and
> seemed a bit messy. This was 4 years ago - and i'm guessing alot has
> changed since then - but it also seems like Spacewalk is on maintenance
> mode and RedHat moved on to Foreman/Katello?
>
Correct

>
>
I'm now tasked with new patching/management solution for linux - and i'm
> exploring foreman as one of the alternatives.
>
> I'm curious how far foreman can scale and what services might be the
> bottlenecks. Can i scale the bottleneck services?
>
Kind of… right now the only "supported" way is running all the services
(apart from pgsql) is on a single node, [1] aims to change that. Most of
the issues are based around the installer and the Puppet modules, your help
testing and fixing issues would be amazing!

>
> My use case varies - but it will probably be 100k nodes in a year - and
> upto 500k nodes in few years.
>
Exciting! I believe this will be one of the biggest deployment.

>
> It would be ideal if I can run foreman on kubernets with persistent
> storage - is that a possibility?
>
Not for production. See [2] & [3]

>
> With that said - what challenges have you experienced in large scale? what
> services are usual suspects and what can be done to mitigate it?
>
Lots of dragons… off the top of my head:

Don't install katello-agent, instead try katello-host-tools and use
Remote Ex/Ansible or something else to run 'yum update' on the clients.
Use lots of Smart Proxies, not register any clients to the Katello
directly (apart from the Smart Proxies)
Are you using Puppet or any other Config Management tool? That can have a
fairly big impact on scaling.
Use a separate DB server

> I'm guessing there arent too many solutions that work well in that scale -
> so i'm open to splitting it up in smaller environments based on business
> groups. I'd then create an aggregator and routing engine - if need be.
>
Although that is an option, it'd prefer to have 1 and fix the issues as you
come across them.

> Thank you
> -ilya
>

[1] Tracker #20850: Allow split deployments using the installer - Installer - Foreman
[2] https://www.youtube.com/watch?v=mPjUvNAYp1c
[3] https://groups.google.com/d/msg/foreman-dev/IVFkzDFAqSA/2TGa0E3sAQAJ

···

On Wed, Nov 1, 2017 at 10:25 PM, ilya m. wrote:

–
You received this message because you are subscribed to the Google Groups
“Foreman users” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

Andrew_Schofield · November 3, 2017, 2:32am

I know of an 85k node deployment. It really depends on what services you
expect to run.

With 10k hosts which run puppet every 30 minutes you will flatline a 24 cpu
box. We have 11k so far with puppet checking in between 2 and 8 times a day
depending on the environment and we run about 20% cpu (on said 24 cpu box
(with 64G ram)). Our target will be ~25k servers and we expect to run
roughly 50% cpu with that. We don't do a lot on publishes (we're satellite
6 users) but they add to cpu quite a lot.

There is quite a bit of tuning you need to do out of the box:

apache (keepalive, spare / max servers)
passenger (workers and passenger limit - there are a few bugs which can
cause passenger processes to explode)
postgres (connections, work mem and cache mem)
qpidd / qdrouterd (limits / aio limit)

In terms of migrations, the registration process is painful (more than 10 /
15 simultaneous will cause you issues) - satellite has an old(er) version
of candlepin which I understand has some serial limitations.

Be careful with what you as expecting the capsules / smart-proxy to offload
all the load from the master isn't as you might think. There are a lot of
things which simply use the smart proxies as, well, a proxy hence just feed
the request directly to the master (subscriptions, puppet fact / report /
catalog processing etc).

Hope this helps

···

On Thursday, November 2, 2017 at 3:38:28 AM UTC-4, ilya m. wrote: > > Dear foreman users, > > I've recently joined foreman users group. > > Previously (4 years ago) i use to manage spacewalk - which was doing its > job well for 2k+ nodes - but i'd experience issues time to time with > stability + scalability. Also - postgres db left alot to be desired and > seemed a bit messy. This was 4 years ago - and i'm guessing alot has > changed since then - but it also seems like Spacewalk is on maintenance > mode and RedHat moved on to Foreman/Katello? > > I'm now tasked with new patching/management solution for linux - and i'm > exploring foreman as one of the alternatives. > > I'm curious how far foreman can scale and what services might be the > bottlenecks. Can i scale the bottleneck services? > > My use case varies - but it will probably be 100k nodes in a year - and > upto 500k nodes in few years. > > It would be ideal if i can run foreman on kubernets with persistent > storage - is that a possibility? > > With that said - what challenges have you experienced in large scale? what > services are usual suspects and what can be done to mitigate it? > > I'm guessing there arent too many solutions that work well in that scale - > so i'm open to splitting it up in smaller environments based on business > groups. I'd then create an aggregator and routing engine - if need be. > > Thank you > -ilya > >

Gwmngilfen · November 3, 2017, 9:49am

> Dear foreman users,
>
> I've recently joined foreman users group.

Welcome indeed! One extra fact that may be useful is that the majority
of Katello devs are on US-east-coast time, so if you hop into chat[1]
then you'll probably be able to get real-time help. Or keep asking here,
ofc

> Be careful with what you as expecting the capsules / smart-proxy to
> offload all the load from the master isn't as you might think. There are
> a lot of things which simply use the smart proxies as, well, a proxy
> hence just feed the request directly to the master (subscriptions,
> puppet fact / report / catalog processing etc).

All the preceeding is good advice, but related to this, it's worth
knowing that you can also cluster the Foreman core. A DB cluster and
bunch of app boxes, with something like memcache to store Rails state,
is possible, or just separate app boxes for different purposes - I know
one large deployment which had a dedicated box for reports/facts/ENC and
another for the users to interact with.

Let us know how you get on Ilya! If it goes well, maybe we can do an
interview or blog post on it

Greg
Community Lead

···

On 01/11/17 22:25, ilya m. wrote: On 03/11/17 02:32, Andrew Schofield wrote:

Gwmngilfen · November 3, 2017, 10:00am

Sigh, I need more caffeine, helps if you add the link:

[1] Foreman :: Support

···

On 03/11/17 09:49, Greg Sutcliffe wrote: > On 01/11/17 22:25, ilya m. wrote: >> Dear foreman users, >> >> I've recently joined foreman users group. > > Welcome indeed! One extra fact that may be useful is that the majority > of Katello devs are on US-east-coast time, so if you hop into chat[1] > then you'll probably be able to get real-time help. Or keep asking here, > ofc :)

system · May 9, 2018, 12:34pm

(import note - user Atum not recreated, transferred to system user)

Hi there.

I’m fresh on the forum aswell. I’ve been searching for a topic just like this to start building a scalable environment.

To start with, is this web page still relevant to date? I intend to read a whole lot of the documentation as I test such an environment and it looks like this “guide” is a good starting point.

Thanks for your input.

Matt_Cahill · May 13, 2018, 11:11pm

That architecture design is still relevant, also look at this:

https://theforeman.org/2015/12/journey_to_high_availability.html

We fairly recently built a cluster serving around 8000 nodes with this design.

kotyara85 · May 14, 2018, 4:48am

How do you deal with reports from puppet -> foreman?

We have ~4000 nodes and puppet master is very slow when responses to nodes requests. We see ~200k of opened files with foreman.rb report file & node.rb.

Thanks

system · May 14, 2018, 11:11am

(import note - user Atum not recreated, transferred to system user)

Thank you Matt.

Found that article already. Preparing lab environment.