Enumerating Foreman Scaling Issues


#1

Within the container RFC thread, the point was raised that currently parts of Foreman are not designed to scale. This thread is aimed at capturing discussion around what the current issues are that exist within the Foreman ecosystem that prevent proper scaling of core, plugin and backend services. This has been mentioned as an issue of the current Foreman architecture.

Below I’ll keep a running list of each item that are believed to need changes for scaling. What I would ask of the community is to help share any information you have on why any of the services listed have scaling issues, and what those issues are in as much details as possible. I would strongly prefer this thread be a fact finding mission, focused on discovering the issues, and not digressing into solutions – that will come in follow up threads. If I missed any services please call those out and I will keep the top thread up to date.

  • Foreman application
  • Dynflow
  • Databases
  • Smart proxy
  • Pulp API
    • Built to be deployed on multiple boxes, pointing to the same mongodb instance, possibly with a load balancer in front
  • Pulp content serving
    • Can easily be deployed on multiple boxes with a shared file system, possibly with a load balancer in front.
  • Pulp workers / resource manager
    • Built to scale by Pulp team, can have multiple resource managers for failover
    • 1-N workers that can be spread across multiple hosts
    • Requires shared filesystem for normal workers
  • Qpid
  • Candlepin
  • Proxy with content/reverse proxy
  • Certificates
  • Installation and Deployment
  • Report expiration cronjob
    • exclusive database locks when deleting heavy amounts of reports

#2

To clarify this is meant to cover both our own installation/deployment shortcomings as well as actual software deficiencies?


#3

Scalable product, I would highly suggest looking into Tanium.

Eric, How far do you want to go with this?
Post actual real life issues with each piece of software. I have a response written, but decided not to post it until.


#4

I’d say cron jobs, specifically cron job which expires reports. It’s a problem for deployments with heavy report updates since deleting from tables require exclusive locks. I am not sure containers can help here unless we change the way we store reports.


#5

Good question – yes, this should cover current deployment tooling shortcoming as well as software deficiencies. I’ll add a generic section for the former as well.


#6

Sorry Daniel, I don’t see how Tanium plays into this conversation. Can you explain further? I’d like to capture as much data as possible about scaling issues that our software or the deployment of that software has.


#7

I added a bullet point for this item. I tried my best to capture the nuance but please reply with more details if you feel it warrants. Thanks.


#8

Understood Eric, — I only mention Tanium, as the methodology here is P2P distributed. In terms of scalability this becomes faster as more nodes are added to the environment. In terms of Katello, the more nodes, the slower the platform becomes. I do realize this is more towards design than towards this topic, and I don’t want to derail the conversation.

Real - life scenarios I will attempt to hit the bullet list. Forgive me.

** Candlepin:

In an organization, which has over 1000 ESX nodes to manage, 6000+ hosts ever changing daily, Candlepin is by far the worst with Red Hat Licensing. I should not have to care how many VDS subscriptions are tied to ESX platforms, nor how many permanent licenses are available to register a machine during deployment automation. ( by reason of VDS virt-who doesn’t report immediately, therefore by using a perm license pool for 100% arbitration. ) In such organization, activation keys can not scale at all, 3 operating systems, 3 lifecycles, 3 month release schedule * 1000 ESX platforms = 27,000 entries in candlepin. Resolution: Just 3 activation keys related to OS release only, dropping into Library content-view upon registration, moving into Lifecycle at a later date.

** Dynflow
This can greatly depend on how many capsules are configured; during Publishing a content-view, 18 repos with 4 capsules can spawn 1000 tasks. The more capsules, the more tasks. Upon failure, you can not kill this job in an easy manner. Options are, Resume — or Skip each task.

** Databases
Postgres and Mongo, with no tuning sense. 8G or 64G ram installed on the host, default values are used, this could be improved dramatically, instead of resorting to google, finding pgtune. Documentation could greatly benefit on placing postgres on its own filesystem. /var/lib/pgsql, because this is going to grow over time. One bad query, /var/lib/pgsql/data/base/pgsql_tmp, need I say more?

Mongo, out of the box
WARNING: /sys/kernel/mm/transparent_hugepage/enabled is ‘always’. ** We suggest setting it to ‘never’
WARNING: /sys/kernel/mm/transparent_hugepage/defrag is ‘always’. ** We suggest setting it to ‘never’

QPid; 2k per host registered, overtime: du -sh /var/lib/qpidd = 11.8G
Mapping these entries back to what is actually still valid is quite the trip. pulp.agent.123456777 with jrnl file
Example: Pretty sure this one is dead, never deleted chewing space

find /var/lib/qpidd -name “fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl” -ls
qpidd 2101248 Jul 10 2017 /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl

find /var/lib/qpidd -name “fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl” -ls
qpidd 89 Jul 10 2017 /var/lib/qpidd/.qpidd/qls/jrnl2/pulp.agent.1c37ee76-db9a-4eb6-b7f4-bc192188a8a1/fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl -> /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl

  • qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 | grep pulp.agent | awk ‘{print $1}’ | wc -l
    6522

QRouterd: Always something, ( crashing, patch code due to EPEL releases )

Goferd: The most unreliable reporting agent. Memory leak guilty, 2G at times. Constant communication issues if not SSL related back to qdrouter. Needs to be cycled at-least once a week per client, just for reporting. Reliability to push patches from Katello to Client, hit or miss based on this agents status. I REALLY miss ‘osad’ from SAT5. ( One daemon reporting status, and remote execution all in one )

Puppet, I don’t use modules in my environment. Chef is company chosen state management, so puppet is just reporting, Some days, numerous failures. I wanna rip this out and just use Chef — however

Chef integration, completely needs more support. This plugin has not worked reliably in any release of Katello. It’s always something. **Oops, we’re sorry but something went wrong undefined method `name’ for nil:NilClass **Oops, we’re sorry but something went wrong no implicit conversion of Symbol into Integer

RBAC:
Can I assign a developer rights to manage his hosts within Katello, without giving him the farm. Not with alot of effort, and even then – going to run into a task the user will not be able to perform w/o elevated privileges ----

Which leads me too —
Who is actually connected to Katello? There is nothing available showing logged in users.


#9

Regarding databases (that aren’t mongodb), in terms of scaling horizontally, there are two basic paradigms for scaling to have multiple nodes serving your application: Immediate consistency or eventual consistency. For immediate consistency, you need to be able to guarantee replication within a database cluster has completed before more requests are served or writes are allowed. This can be done with Galera cluster which was integrated into MariaDB 10.2. Unfortunately, this scales across data-centers poorly. The latency between datacenters in, say the US and the UK (for example) makes your queries slow down while the replication happens and the guarantee of consistency occurs.

If all you need is speed, you can do an active/standby configuration in which you have one database node for writes, and one or more slave for reads. It is helpful to know what your read/write ratio is to see if this offers any real benefit and you have to be careful in your application to send reads to the correct place (eg, don’t bother the write master with your reads). This relies on the fact that the database will be eventually consistent

If you need to have a mixture of both, you can use something like MaxScale which is capable of doing that read/write splitting for you and giving you the ability to automatically promote a database in the event of a failure.

For a true multi-master clustered configuration though, you need proprietary software (eg, ScaleArc or Oracle)

For large tables and just speeding up an individual database, partitioning can be implemented (I don’t know if Foreman is doing this already).

What is appropriate just depends on the use case, size and type (OLAP vs OLTP) of the dataset, so there isn’t really one true answer for this. But as long as MariaDB is supported by Foreman/Katello, all of the above can be achieved according to the use case at this point in time.


#10

Thanks for the detailed info. I think maybe I need to clarify a bit more. I want to gather data not on how to scale the various services, but issues the services have with scaling within the context of operating within a Foreman installation.


#11

One thing that comes to my mind is the way we currently handle VNC consoles.
Foreman spawns an extra tool to translate websockets to a VNC session that listens on a random port on the local machine. This makes scaling very hard as you cannot use a load balancer for HA in this scenario.

ManageIQ has a nice implementation for this, I believe with a custom ruby websocket proxy that works with tokens. That’s definitely worth checking out.


#12

Do you have any links? Interesting.


#13

Sure, but I think you have to turn on that sad music before clicking: https://github.com/theforeman/foreman/blob/ae96cb70130968bdcc8b7a9a9af7edc7c6e0d9a8/lib/ws_proxy.rb#L43-L50


#14

Oh no, I know this one very well. I mean link to the ManageIQ codebase :slight_smile:


#15

I don’t rembember, but I think this is it: https://github.com/ManageIQ/manageiq/blob/33d994ae207105a508d9e214d39d9ec66ac066dd/lib/websocket_proxy.rb


#16

Thanks for the information thus far. Given @sean797 sparked this conversation, going to ask him to weigh in.


#17

Thanks for the ping @ehelms I did intend to reply earlier.

I’ve added the comments in bold below, shout if there are any questions.

  • Foreman application

    • memcache advised
  • Dynflow

  • Databases

    • Pulp3 moving to pgsql
    • pgsql can scale across multiple nodes.
  • Smart proxy

    • Many different ways of scaling each feature. Requires thought, some of us did discuss it in Ghent
    • Might need to support actions for each feature on any single smart proxy in a group.
    • Puppet modules / Ansible roles ect… need to be on each smart proxy
  • Pulp API

    • Built to be deployed on multiple boxes, pointing to the same mongodb instance, possibly with a load balancer in front
  • Pulp content serving

    • Can easily be deployed on multiple boxes with a shared file system, possibly with a load balancer in front.
    • Different yum metadata paths if using a load balancer without shared storage (e.g with 2 smart proxies)
  • Pulp workers / resource manager

    • Built to scale by Pulp team, can have multiple resource managers for failover
    • 1-N workers that can be spread across multiple hosts
    • Requires shared filesystem for normal workers
  • Qpid

    • Supports replicating queues, out installer does not.
  • Candlepin

    • can be scaled fairly easily, just point to the same database
    • Manifest import can cause DB locks and is slow, imagine with multiple tenants
  • Proxy with content/reverse proxy

    • everything is covered in either “Pulp content serving” or “Smart proxy” sections above.
  • Certificates

    • If only we can generate them with the installer
  • Installation and Deployment

    • currently doesn’t support splitting out each role/feature.
    • need to support certificate generation for each role/feature.
    • qpid HA config
    • smart proxy feature config might be needed (e.g BIND)
    • users find it hard to run the installer on multiple nodes with the same options, and certain options pointing to another node ect… (e.g. use a Puppet Server or Ansible)
  • Report expiration cronjob

    • exclusive database locks when deleting heavy amounts of reports