Smart Proxy: Future Design, Scaling and Use Cases

ehelms · January 22, 2021, 2:30am

Thank you all for the great discussions thus far. I am going to attempt to re-cap the highlights and proposals. At the end I am going to try to set the stage for further discussions.

First off, from its original intent and this thread we can take away that we should think of the current smart-proxy as a control plane intended to provide APIs and discovery of services. And that this fact can and does impose a heightened security need for the smart-proxy that client end points pose a risk to. That smart-proxy traffic should aim to be limited to largely Foreman <-> smart-proxy or, in some cases, smart-proxy <-> service. And that a user ought to be able to have a fairly strict firewall setup for the smart-proxy to reduce attack surface.

The general proposal is that there should exist at least one additional service that is dedicated to client traffic. And possibly a further break out of services into either groups of related services or dedicated services that map 1:1 with functionality supported. A quick recap of client services today or proposed (I may miss one):

templates
global registration
container gateway
subscription-manager proxying (today handled by Apache reverse proxy)
facts
openscap
REX
SSH keys

I think it is important here, that as we consider this, we look at the software vs concepts and ensure we draw the lines correctly. We have the smart-proxy software, that is a Sinatra based web application serving multiple end points and providing a base set of functionality such as handling SSL, certificates, configuration, and plugins. Concept wise we have the Foreman Proxy, the process that runs on a system and represents an instance of the smart-proxy. And in the UI/API we have Smart Proxies that are registered and managed. Stretch this out to the Katello use case and we end up with what we often call a Foreman Content Proxy that both adds a defined set of services and functionality (Pulp, reverse proxy, Qpid) and is treated conceptually as a single entity. That is, Katello tends to think of managing the entire host as the Content Proxy, not just the Smart Proxy software even though that is how it’s surfaced in Foreman as an object.

Additionally, we have an RFC aimed at enabling Remote Execution against the underlying host that we think of as the conceptual Foreman (Content) Proxy to be able to perform management actions on it from Foreman itself.

Let’s take the easy split to further discuss the various layers of software and concept. Let’s assume we strictly split functionality into what we traditionally think of as a Foreman Proxy (service API and discovery) and new concept, a Client Proxy (for lack of a better term). What would those look like at:

The software level, is this a new project? A creative configuration of the smart-proxy software? How do we ensure at a user and developer level that it is clear what does what, what got deployed and prevent mis-configurations that can lead to some of the security and conceptual concerns?
Conceptually how does this surface inside Foreman? How do I view and manage Foreman Proxies vs Client Proxies?
Should the two be allowed to be co-located? Does this put an additional burden on the user infrastructure wise? Does this make it easier for the user infrastructure wise? Does it give them more choice?
Does this increase or decrease deployment burden?
How would dual purpose features be handled? For example, Katello uses the service discovery nature of the smart-proxy to expose Pulp 3 attributes, but Pulp 3 is client facing.