Understood Eric, — I only mention Tanium, as the methodology here is P2P distributed. In terms of scalability this becomes faster as more nodes are added to the environment. In terms of Katello, the more nodes, the slower the platform becomes. I do realize this is more towards design than towards this topic, and I don’t want to derail the conversation.
Real - life scenarios I will attempt to hit the bullet list. Forgive me.
** Candlepin:
In an organization, which has over 1000 ESX nodes to manage, 6000+ hosts ever changing daily, Candlepin is by far the worst with Red Hat Licensing. I should not have to care how many VDS subscriptions are tied to ESX platforms, nor how many permanent licenses are available to register a machine during deployment automation. ( by reason of VDS virt-who doesn’t report immediately, therefore by using a perm license pool for 100% arbitration. ) In such organization, activation keys can not scale at all, 3 operating systems, 3 lifecycles, 3 month release schedule * 1000 ESX platforms = 27,000 entries in candlepin. Resolution: Just 3 activation keys related to OS release only, dropping into Library content-view upon registration, moving into Lifecycle at a later date.
** Dynflow
This can greatly depend on how many capsules are configured; during Publishing a content-view, 18 repos with 4 capsules can spawn 1000 tasks. The more capsules, the more tasks. Upon failure, you can not kill this job in an easy manner. Options are, Resume — or Skip each task.
** Databases
Postgres and Mongo, with no tuning sense. 8G or 64G ram installed on the host, default values are used, this could be improved dramatically, instead of resorting to google, finding pgtune. Documentation could greatly benefit on placing postgres on its own filesystem. /var/lib/pgsql, because this is going to grow over time. One bad query, /var/lib/pgsql/data/base/pgsql_tmp, need I say more?
Mongo, out of the box
WARNING: /sys/kernel/mm/transparent_hugepage/enabled is ‘always’. ** We suggest setting it to ‘never’
WARNING: /sys/kernel/mm/transparent_hugepage/defrag is ‘always’. ** We suggest setting it to ‘never’
QPid; 2k per host registered, overtime: du -sh /var/lib/qpidd = 11.8G
Mapping these entries back to what is actually still valid is quite the trip. pulp.agent.123456777 with jrnl file
Example: Pretty sure this one is dead, never deleted chewing space
find /var/lib/qpidd -name “fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl” -ls
qpidd 2101248 Jul 10 2017 /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl
find /var/lib/qpidd -name “fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl” -ls
qpidd 89 Jul 10 2017 /var/lib/qpidd/.qpidd/qls/jrnl2/pulp.agent.1c37ee76-db9a-4eb6-b7f4-bc192188a8a1/fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl -> /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/fdbaa789-e514-45cd-af51-a2523df1cf56.jrnl
- qpid-stat -q --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 | grep pulp.agent | awk ‘{print $1}’ | wc -l
6522
QRouterd: Always something, ( crashing, patch code due to EPEL releases )
Goferd: The most unreliable reporting agent. Memory leak guilty, 2G at times. Constant communication issues if not SSL related back to qdrouter. Needs to be cycled at-least once a week per client, just for reporting. Reliability to push patches from Katello to Client, hit or miss based on this agents status. I REALLY miss ‘osad’ from SAT5. ( One daemon reporting status, and remote execution all in one )
Puppet, I don’t use modules in my environment. Chef is company chosen state management, so puppet is just reporting, Some days, numerous failures. I wanna rip this out and just use Chef — however
Chef integration, completely needs more support. This plugin has not worked reliably in any release of Katello. It’s always something. **Oops, we’re sorry but something went wrong undefined method `name’ for nil:NilClass **Oops, we’re sorry but something went wrong no implicit conversion of Symbol into Integer
RBAC:
Can I assign a developer rights to manage his hosts within Katello, without giving him the farm. Not with alot of effort, and even then – going to run into a task the user will not be able to perform w/o elevated privileges ----
Which leads me too —
Who is actually connected to Katello? There is nothing available showing logged in users.