RFC: Host registration and Load balancers

Global registration is missing an option to use load balancer when registering and configuring new hosts. For some users [0] this is considered as a blocker for moving to the global registration from other client tools that are going to be deprecated in upcoming releases.

Implementation

Foreman
In the Smart Proxy field list load balancers that are associated with them. Selected LB URL will be then used in the registration command and in the templates for subscription-manager configuration. List of load balancers for smart proxies could be loaded the same way as we load features right now.

For users that didn’t update smart proxy configuration or don’t have a version that supports it yet we can display a simple text field for load-balancer URL where they can put the URL they want.

Smart Proxy
Store list of load balancers in yaml config file. It can be either in settings.yml or it can be a new file.

Foreman installer
New option for adding / removing load balancers

User scenario

  • Add load balancer(s) to the Smart Proxy
  • Refresh Smart Proxy in Foreman
  • Generate registration command with LB
  • Register host with it
  • ???
  • Profit

Having covered this scenario would help us to improve user experience, moves us closer to deprecation of other registration clients and having information about load balancers in Foreman could be used in other components, like for example in provisioning.

[0] 2105995 – Need Proper Registration method in Load-balancing capsule setup for a clients
[1] Satellite docs - Configuring load balancers

I’m not sure I’m following. Could you provide some diagram which would show how things are deployed in this scenario?

There is a diagram from the Satellite documentation:

The deployment that is described in the docs feels more like a workaround rather than anything else. There have been discussions in the past about how to solve properly support HA proxies [1,2]. Honestly I’m not fond of adding more functionality to the workaround.

Does proxy need to know behind what load balancer it is running for any other reason than reporting it to foreman?

[1] - Supporting HA Foreman Proxies
[2] - Highly Available Smart Proxies (part 2)

For at least content, the documented load-balanced proxy setup is a supported and working setup as it has existed for many a release and implemented by a number of users across Katello and Satellite footprints. The goal of the RFC as I understand it is to bring parity to global registration with our other forms of registration and aim to make the user experience when using the setup less painful.

As of today and this RFC? No. However, when using global registration this information is important for users. When planning and performing upgrades of the proxy this information is also useful for the user.

At least with the Smart Proxy Templates module there is the option template_url, which is the external URL which clients should use. In a load balanced setup, the user is expected to use --foreman-proxy-template-url $URL. That can point to a load balancer. The registration module reuses this setting. Similarly, the Pulp plugin has a setting for the RHSM URL, but that’s automatically set to the common name on the certificate (I think --certs-node-fqdn $LB_HOSTNAME).

It is at best a workaround because in the UI you still select just a single Smart Proxy, but it should allow you to move forward even today without any code changes.

Many years ago we already talked about adding an entity in Foreman which has 1 or more Smart Proxy instances so Foreman would understand the topology. It depends on the Smart Proxy module whether this is needed.

With that knowledge, I’m :-1: on your proposal.

This is basically rewriting the entire design. Today we have the Smart Proxy which knows its external URLs. Everything is auto discovered and you can just register a new Smart Proxy. Your proposal rewrites that. Suddenly you have to modify everything where the Smart Proxy settings are used and apply logic to change those settings.

Can you elaborate how your solution improves the situation over what we have today?

Can you elaborate what use this is?

Are any of these reported to Foreman and stored for reference?

Yes. The template_url predates the capabilites framework, so it’s exposed as a REST endpoint and usable in Foreman here:

Then it’s used here:

Nowadays it can be patched to use it via the settings without a live round trip.

proxy.setting('Templates', 'template_url')

Then for the RHSM URL we have to look in Katello where it is defined here:

And just below it is the Pulp content URL to be used by hosts. This is an extension of the Smart Proxy model, so proxy.rhsm_url and smart_proxy.pulp_content_url should work.

Thanks for the information.

Here is my attempt to summarize some takeaways. The way content proxies are configured today there is no indicator of a load-balancer. The rhsm_url is configured based on the FQDN of the host running the smart-proxy and, today, cannot be relied upon to set the correct information. The load-balancer documentation does not reference the configuration of template-url either for configuration.

First, a reminder that the current configuration string documented for installing a proxy with content behind a load-balancer is (taken from Satellite docs):

# satellite-installer --scenario capsule \
--foreman-proxy-register-in-foreman "true" \
--foreman-proxy-foreman-base-url "https://satellite.example.com" \
--foreman-proxy-trusted-hosts "satellite.example.com" \
--foreman-proxy-trusted-hosts "capsule.example.com" \
--foreman-proxy-oauth-consumer-key "oauth key" \
--foreman-proxy-oauth-consumer-secret "oauth secret" \
--certs-tar-file "capsule.example.com-certs.tar" \
--certs-cname "loadbalancer.example.com"
  1. The load-balancer documentation should be updated to include --foreman-proxy-template-url to set the load-balancer when one is used. Do this by including setting --foreman-proxy-template-url https://loadbalancer.example.com when configuring, e.g. (Configuring Capsules with a Load Balancer Red Hat Satellite 6.11 | Red Hat Customer Portal)

  2. The rhsm_url within puppet-foreman_proxy_content should be updated to calculate it with certs::apache::cname if the value exists and fallback to certs::apache::hostname as it does today.

The idea being, if these values are configured correctly, the registration page should continue to work as is without the need for modification but calculating all the right values?

Loading proxy’s load balancers we can use them in the registration form.

To have stored information about load balancers in the smart proxy, like this

# settings/load_balancers.yml
load_balancers:
  - balancer1.example.com
  - balancer2.example.com
  - balancer3.example.com

I’m thinking about different solution, with much simpler change and not affecting many components:

We can just add one new field custom_url to the form, with basic validation, where users can put whatever they want and use that URL in the registration and configuration of the host.

No changes in smart-proxy or foreman-installer required. The downside of that is that users have to enter the URL manually every single time they create the command.

There are 2 ways to go about this. We’ve been having this discussion with various people for year so I’ll give a short introduction.

The first is to architect everything so the whole load balancing is transparent to Foreman. This means Foreman also talks to the Load Balancer to talk to the Smart Proxy. There is only a single Smart Proxy entity in Foreman. All URLs/certificate names on the Smart Proxy are configured to use the load balanced hostname.

In practice this means you have something like smartproxy.lab.example.com as the service hostname. This is backed by smartproxy01.lab.example.com and smartproxy02.lab.example.com.

From an architectural perspective this is very clean: no modifications need to be made in Foreman and end-user applications. The only downside is that you need to make sure both hosts are configured exactly the same. Also, there are stateful services.

For example, Pulp has state. This needs to be addressed somehow. One solution to this is using shared storage (loading /var/lib/pulp from NFS, replicating it using ceph, etc) and using an common database. Then you have failover capabilities.

Additionally, some services (like ISC DHCP) need much more complex handling. In that case it may not be feasible at all to run the Smart Proxy in a load balanced setup.

The other approach is to make essentially build a Smart Proxy cluster mode. It means you make Foreman aware of the topology (essentially what you’re proposing now). There are many issues to consider here.

For example, does Foreman keep both Smart Proxies in sync? If so, how? When we discussed this in the past we came to the conclusion that it depends on each Smart Proxy module and often even provider how that should be done.

Concrete example: with the DNS module there are multiple providers. Some just talk some network API and that’s trivial to load balance. nsupdate is a special case. By default we install a local ISC BIND server locally and then connect to localhost. That’s not going to work as expected if you deploy it twice.

Similar concerns are there for Pulp. If you effectively have multiple Pulp instances they can go out of sync with each other. If there’s a load balancer in front of it, results can be unexpected.

So to make it concrete. For a load balanced Smart Proxy setup you must very carefully consider which Smart Proxy modules you enable. Depending on which modules, you can choose a solution.

Your proposal barely touches the surface.

1 Like

It refers to the Smart Proxy HTTP interface, so I think it should be include port 8000: --foreman-proxy-template-url https://loadbalancer.example.com:8000.

CNAME is an array. Do you take the first value?

I think technically the Apache certificate only needs the load balanced hostname and doesn’t need the exact Smart Proxy hostname on it so that’s also something to consider. Though I may be missing something.

Yes, that how it was designed.

Note that this is exactly the same problem you would run into when you would dual home the Smart Proxy. For example, you could have a VLAN for Foreman <-> Foreman Proxy communication and a VLAN for Foreman Proxy <-> Clients. Then Foreman would also use a different hostname than what clients use.

Bother, that is right. That makes it a bit tougher to configure the right value and would have me lean towards a dedicated parameter to configure and thus indicate the correct endpoint if it’s set. Something akin to --foreman-proxy-content-rhsm-loadbalancer-host or --foreman-proxy-content-rhsm-host.

That is my biggest concern – the user experience is rather ugly and as @ekohl has pointed out there are mechanisms in place to handle this if we configure things correctly.

I also considered an explicit parameter. Perhaps it should drive certs::apache::hostname given that’s also the hostname that ends up on Pulp:

Note that for Pulp we don’t have any CNAME support. Perhaps that’s also a bug, but likely one you’ll run into if you go the CNAME route.

@ehelms @ekohl

I see that there are a lot of ideas how to approach this feature, but I’m not sure if I follow and understand all of them.

Could you please summarize for me what would be the correct way how to do this feature, how it should be done properly?

As I gather, and will try to summarize, the general consensus here is that we should be able to use the already built in methods with little to no changes to the registration API as long as we configure the smart-proxy correctly. There are three pieces to configure on a smart-proxy:

The last two are currently derived from the servername property of Apache on the smart-proxy. That leads to action #1:

  • Add a parameter(s) to puppet-foreman_proxy_content to configure the RHSM and Pulp content hostname

That would lead to users configuring the proxy somewhat like:

foreman-installer --foreman-proxy-template-url https://loadbalancer.example.com:8000 --foreman-proxy-content-rhsm-hostname loadbalancer.example.com --foreman-proxy-content-pulp-content-hostname loadbalancer.example.com

When testing this out a bit in my setup, I did find that the changes to RHSM and Pulp content URL are properly respected and would get set to the load balancers hostname. However, I was not finding that the template-url was being respected. That is, when I generate the registration command I still get the hostname of the smart-proxy and not the load-balancer, e.g.:

curl -sS  'https://pipe-katello-proxy-nightly-centos8-stream.wareagle.example.com:9090/register?

Where I would expect that to actually be:

curl -sS  'https://loadbalancer.example.com:9090/register?
1 Like

This should be HTTP, not HTTPS: --foreman-proxy-template-url http://loadbalancer.example.com:8000

Registration is a bit of an odd feature in that it partially relies on templates, but not 100%.

For example, templates are retrieved over HTTP because tooling (such as Anaconda) can’t deal with HTTPS (either not at all or no way to provide HTTPS certificates). Even if modern versions can, we often need to support old versions. I have not checked if EL 9 can provision over HTTPS, but it’d surprise me.

So let’s figure out how the registration command determines the URL:

So that doesn’t have any way to configure it.

It can respect a different URL if it’s sent beyond the first command:

This is generated here:

I recall back when we designed the module I wanted an explicit parameter to configure the endpoint, but IIRC @Marek_Hulan preferred a dynamic endpoint since it saved configuration options. In RFC: Simple & automatic host registration WF - #52 by Marek_Hulan and the following posts we did talk about the case.

So in short: from Foreman there is no reliable way to know the external endpoint for registration in case of a Smart Proxy in a dual homed / load balanced setup.

But I’m going to make it even more complicated. We also have the HTTPBoot module. That exposes the ports it uses, which is then exposed here:

And made available in Foreman here:

And then it’s used in cases like these:

Note that also uses the Smart Proxy hostname.

I’ll still stand by my point that the Smart Proxy can’t be properly load balanced now and anyone pretending we can support it now is fooling themselves. Doing it properly is a massive effort.

What can be done today is with a select number of modules to 100% hide the Smart Proxy behind the load balancer and pretend all hostnames are the loadbalanced hostname. That means Foreman only talks to the load balanced Smart Proxy. However, it implies Pulp is also load balanced (shared storage, shared database).

Looks like this is the perfect thread to place my question and describe a similar issue we have with the currently available implementation of smart-proxy / katello connection.

Scenario:
Katello <— (10.0.0.1) — (10.0.0.2) —> Smart Proxy <— (192.168.1.1) — (192.168.1.0/24) —> Managed hosts

The smart proxy has 2 interfaces:

  • Interface 1 with 192.168.1.1 which connects the manged hosts
  • Interface 2 with 10.0.0.2 which has the connection to Katello

Currently, this scenario does not work out of the box because Katello tries to reach the Smart Proxy using the IP address 192.168.1.1 (= first interface IP). If a user press “sync” on the smart-proxy content sync page, it uses the wrong FQDN/IP to reach the smart-proxy.

A workaround exist to set the 10.0.0.2 ip address for the smart proxy FQDN in Katello’s /etc/hosts file.

Previously (with pulp2?), the url to reach the smart proxy was generated by the Smart-Proxy URL.
I would expect to have 2 pulp3 URLs: one which should be used by the managed hosts, and one by Katello itself.