Split External DBs

With the movement to Pulp3 and the removal of MongoDB, a managed database Katello installation now only uses a single database service, PostgreSQL, which contains Foreman, Candlepin, and Pulp databases.

With that said, we also have the capability to deploy with unmanaged databases, and I’d like to start a discussion about the capabilities and installation experience of this feature, specifically regarding the below use cases:

  1. For users who wish to deploy with a single external database server, could we have a single set of parameters for this DB server which simplifies the installation experience for external DBs? I.e. the user would only need to specify once the address and port of the external database server and by default all components would use these.

  2. For very large deployments where Foreman, Candlepin, and Pulp are all competing for PostgreSQL connections, I think we should also make it possible to configure these external DBs separately per service, so that for example repository sync or CV publish/promote performance would not be throttled by the DB during times of high registration load.

Regarding the 2nd use case, it should already be possible with the way Pulp3 is installed to specify a separate external DB, but for Candlepin it’s not the case currently and would require some additional work.

I think a way to achieve both requirements would be implementing parameters such as --candlepin-use-foreman-db and --pulpcore-use-foreman-db which default to true. Then a user who wishes to configure a single external database server can simply pass the --foreman-db* parameters to the installer and candlepin and pulpcore would use the same configuration. However for the user who wants to configure separate DBs per service, those parameters could be changed to false and additional configuration for pulpcore and/or candlepin DBs could be supplied.

Please weigh in if you have any thoughts on this proposed design! Thanks,

2 Likes

So, I looked further into this and I’m already able to perform this type of installation in nightly – I didn’t run into the problems that I anticipated.

The one issue I did run into was that the server hosting the Foreman database was also required to have https://fedorapeople.org/groups/katello/releases/yum/nightly/pulpcore/el7/x86_64/rh-postgresql12-postgresql-evr-0.0.2-1.el7.x86_64.rpm installed, which was not documented anywhere.

1 Like

Thank you for getting back to us. Can you share your full notes, if you have any? This deployment is indeed very interesting. Do you even plan to split databases into separate hosts or VMs?

I am interested mainly from the documentation perspective, our @installer team can definitely share opinion on the technical side.

@wbclark is on the Red Hat platform team and has a strong interest in the installer.

I have thought about this as well. It’s probably safe to assume that there’s 3 groups to consider:

  • Default all in one. Everything is on the same host. This will likely be the majority of users
  • Application server + DB server. Once you start to scale out, this is a logical first step. You may have a DBA team. Most likely you want every DB on this DB server
  • Full scale out - split everything. Ideally you’d also split the applications themselves

That last past is my long term desire. Currently Katello has bad entry points. I can’t make anything else out of it - it’s bad. You have --katello-candlepin-* options. Then on a Katello server you have --katello-pulp-* but you also see --foreman-proxy-content-pulp-* options that are actually ignored. I can go on a long rant with more examples, but that’s bad UI/UX.

With the move to Pulp 3 we can actually improve a lot - it has an architecture that makes it easier to deploy. I also started a PR to expose --candlepin-* options. It also allows --no-enable-candlepin to not deploy candlepin at all. That allows for a composable setup. It also makes it more transparent which systems are being deployed and how to configure them.

Ideally we’ll also have top level pulpcore parameters (--pulpcore-* instead of --foreman-proxy-content-pulpcore-*). That would allow users to deploy Pulpcore on a different server.

I’m waiting for the Pulp 2 removal before picking this up again. Not having to think about that old deployment makes refactoring easier.

Hi, thanks for your reply

Here are the installer options I used with Katello 4.0.0-0.2

# foreman-installer --scenario katello --verbose \
--foreman-db-manage false \
--foreman-db-host postgres-foreman.example.com \
--foreman-db-database foreman \
--foreman-db-username foreman \
--foreman-db-password fakepass \
--foreman-proxy-content-pulpcore-manage-postgresql false \
--foreman-proxy-content-pulpcore-postgresql-host postgres-pulpcore.example.com \
--foreman-proxy-content-pulpcore-postgresql-user pulp \
--foreman-proxy-content-pulpcore-postgresql-password fakepass \
--katello-candlepin-manage-db false \
--katello-candlepin-db-host postgres-candlepin.example.com \
--katello-candlepin-db-name candlepin \
--katello-candlepin-db-user candlepin \
--katello-candlepin-db-password fakepass 

For the infrastructure, I deployed 4 separate VMs running the latest RHEL7
The Candlepin DB server used postgresql-server-9.2.24-4.el7_8.x86_64
The Pulpcore DB server used rh-postgresql96-postgresql-server-syspaths-9.6.10-1.el7.x86_64
The Foreman DB server used rh-postgresql12-postgresql-server-syspaths-12.1-2.el7.x86_64 and was also required to have rh-postgresql12-postgresql-evr-0.0.2-1.el7.x86_64

The only real reason for the different versions is that I started with the versions shipped in RHEL repos and had to replace them with more recent versions to get foreman-rake db:migrate and sudo -u pulp DJANGO_SETTINGS_MODULE=pulpcore.app.settings PULP_SETTINGS=/etc/pulp/settings.py python3-django-admin migrate --noinput to complete successfully.

If I had to do it again I would just start with postgres 12 on each DB server.

The basic steps for setting up each DB were like

# /opt/rh/rh-postgresql12/root/usr/bin/postgresql-setup --initdb
# vim /var/opt/rh/rh-postgresql12/lib/pgsql/data/pg_hba.conf # added the following line
host    all             all             katello.example.com            md5
# vim /var/opt/rh/rh-postgresql12/lib/pgsql/data/postgresql.conf # added the following line
listen_addresses = '*'
# systemctl enable --now postgresql
# su - postgres -c psql
postgres=# CREATE USER foreman WITH PASSWORD fakepass;
postgres=# CREATE DATABASE foreman OWNER foreman;
postgres=# \q
# firewall-cmd --add-service=postgresql
# firewall-cmd --runtime-to-perm

Hi wbclark, Ewoud.

wbclark, I was wondering if you got this working as desired; and if so, have you automated it?

I have been working on an automated remote database capability for Katello, with Puppet Bolt and received some good feedback from Ewoud (thanks, Ewoud!); however, my solution takes a different tack than what’s described in the documentation available for Katello remote databases… I set Katello’s *db_manage arguments to true. It’s works quite well and the postgresql/mongodb servers have a much more minimal install than what’s described here.

I’m providing the solution, stepwise, below:

  1. Install the remote database packages (including evr!) with Bolt *[1] (or manually)
  2. Puppet apply() the puppet/mongodb and puppetlabs/postgresql modules on the remote database server/s to create the candlepin, foreman, pulp databases *[2], *[3]
  3. Via configuration data sourced from Hiera, set db_manage, candlepin_manage_db, pulp_manage_db, and pulpcore_manage_postgresql all to true on the Katello server; as well as configuring database credentials and service endpoints (on the Katello server) *[4], *[5]
  4. The Katello grouping of modules then does the rest: remotely manage the preconfigured postgresql and mongodb (remote) databases using the Hiera data available in *[5], *[6], with *[5] taking precedence over *[6]

I have a Vagrantfile in the repo that builds the complete solution – in VirtualBox (sadface).
I’d be grateful to get feedback on this and ensure everything is working As Intended®. (smileyface)

[1] https://github.com/superfantasticawesome/boltello/blob/database/plans/database.pp
[2] https://github.com/superfantasticawesome/boltello/blob/database/data/roles/database.yaml
[3] https://github.com/superfantasticawesome/boltello/blob/database/modules/boltello_builder/manifests/database.pp
[4] https://github.com/superfantasticawesome/boltello/blob/bc8d43d613559ab6d1edb6cee8a039ae0db00973/hiera.yaml#L14
[5] https://github.com/superfantasticawesome/boltello/blob/database/data/roles/database/remote.yaml
[6] https://github.com/superfantasticawesome/boltello/blob/database/data/roles/katello.yaml

This will lead to also running a database on the katello host. While it won’t break anything, you are wasting resources. If you do it right, after a fresh install there should be no PostgreSQL nor MongoDB running. Only on the remote DB server.

Note that it’s really manage, so if it once installed and you stop managing it, it won’t uninstall it.

I do not implicitly install postgresql or mongodb on the Katello server and the packages are not listed in search returns from the package manager. It looks like only the db clients are installed. When affecting database services with foreman-maintain, the databases are listed as “(remote)”. I would say this is ideal because the load reduction is massive while the management solution is completely centralized and local to Katello.

As a point of disambiguation: My solution leverages the installer modules, not foreman-installer itself. This means I can model all the installation/configuration logic outside of the logic that exists in the installer.

PostgreSQL and MongoDB databases are only installed remotely and they never exist on the Katello server.

While true that the modules will install postgresql/mongodb, I elect to perform this step with Puppet Bolt.

Also, I believe that “manage” in the module context means a lot more than installing packages:

  • creating the databases in the db container (which is pre-installed on the remote database server/s; e.g., postgresql or mongodb)
  • managing users, roles, permissions/ownership and access rules for the databases created on the preconfigured database servers

I hope this is useful towards designing a solution with foreman-installer. Essentially, it could be performed similar to standing up a capsule server with the foreman-installer; yet, rather than generating certificates, a separate scenario (read: Hiera dataset) would be run on the remote database servers before Katello installation.

This scenario would only install packages on the remote database servers (no configuration)… Next, the installer should be invoked on the Katello server with directives for managing credentials/access/etc on the remote database server/s.

By performing the install in the sequence described above – and by perhaps adding an additional flag in the installer to prevent installation of PostgreSQL and MongoDB – the remote database servers and Katello only get the packages needed: Katello/foreman/etc packages are not installed on the database server/s and Katello has no local databases installed.

This works… and I think it would be immensely helpful for users to have the capability in the installer – or update the documentation to reflect the steps I outlined above.