When should I use an external database and load-balanced proxies?

stdevel · November 29, 2019, 3:04pm

Hello community,
I’m currently designing an Foreman/Katello architecture for a multi-site scenario with about 6.000 hosts.

I’m trying to figure out whether this scenario would benefit from a dedicated database and/or proxies/capsule behind a load-balancer. I had a look at the Foreman and Red Hat Satellite 6 documentation but found no recommendations or rules regarding these options.

What are you experiences? When is it basically a good idea to go for external databases and load-balanced proxies?

Best wishes and thanks a lot in advance,
Christian.

Lang_Jason · December 2, 2019, 1:06am

My experience is as follows:

We did 3-4 “separate” foreman infrastructures with 1k hosts each (split by datacenter). These were “one host” foreman(s) with Onboard Databases. Worked fine - performance was fine, IIRC they were 4 core 16GB RAM boxes. In these days we had <100 puppet classes total and maybe 6 environments to load these classes into.

When we brought everything together - we split our database, puppetmasters, CA, UI, and Proxy across datacenters with LoadBalancers (F5 LTM and GTM) in front of everything.

Has worked better.

As we add more managed nodes/environments/code (currently at 35 environments, 1088 puppet classes, 18k managed servers) we just add more puppet master(s). More API calls from other automated systems? beef up the front-end or add more nodes. Database getting bogged down? Increase the specs of the database tier.

A lot more overhead/management for sure, compared to a single node, but also a lot more manageable. Not quite microservices, but each role being subdivided does help. Our database (mysql) is up around ~80GB and we clean report(s) after 7 days, audits after 30 days.

If you were to keep it on a single server, scaling to 6k nodes:
Foreman Passenger is “memory intensive” - lotta RAM
Database running in memory is key
PuppetServer RAM usage can be huge
Puppet Classes x EnvironmentCount x Jruby max-active-instances means stupid level(s) of JVM heap.
Most other Smart Proxy role(s) don’t need much RAM or CPU

I’d vote for splitting it up. Or at minimum sticking everything behind loadbalancers, so when you need to split and scale horizontally, you aren’t headed back to the drawing board, and it’s just easy.

lzap · December 2, 2019, 9:14am

Reports to be reworked in 1.25 and they will not be any issue anymore.

Lang_Jason · December 2, 2019, 2:21pm

@lzap

Yupp! Been following and eager for that one since you posted it!

lzap · December 3, 2019, 8:06am

It’s coming, I am starting coding once I finish my GH backlog, two weeks unplanned PTO have shaken things a bit on my plate