Smart Proxy: Future Design, Scaling and Use Cases

The smart-proxy has been designed to provide restful API to subsystems and serve as a form of service discovery for those subsystems. As a project, we are seeing the smart-proxy being looked at for providing client facing services. This has the potential to place additional performance pressure on the smart-proxy. This RFC is aimed at identifying the efforts that will result in more traffic through the smart-proxy, and starting a discussion on overall trend design, and efforts we need to undertake to ensure that we pro-actively ensure the smart-proxy can handle these changes.

I intend to take several tactics to help with these discussions and identification of work. The first of these is this post asking the community of developers, and users, to discuss changes needed on the smart-proxy, peformance concerns, re-factorings, simplifications and concerns with this trend. The fundamental question is whether this trend to using the smart-proxy more is the right direction. Second to that is what core changes we need to under take to ensure this increase in usage of the smart-proxy is successful. Additionally, I will schedule some face-to-face SIG style discussions to make decisions and resolve issues syncronously where needed.

From a non-technical stand point, pointing more features, more concerns and more developers at the smart-proxy is a good thing. The smart-proxy has historically had a single maintainer at any time, and a tiny sub-set of active developers who understand the code and design of it’s sub-systems. The more use cases we have for the smart-proxy, the more developers we have building and maintaing it, the better for such a critical piece of the Foreman infrastructure. Further, this can help to bridge some long standing gaps we have with Foreman vs Katello installations.

Current features targetting smart-proxy

If you are currently working on a feature that intends to place additional traffic and pressure on the smart-proxy, please respond with what that feature is and any links to design information to aid in these discussions. The list will get updated, with the current list of on going features:

Identified re-factoring and core design work

Key Open Questions

  • Is the smart-proxy the right place for client facing actions (e.g. global registration, remote execution job fetching and status, container auth, subscription-manager proxying) ?
  • What architectural changes are needed to the smart-proxy to support increased traffic?
  • What should be our scaling guidelines for smart-proxy deployments based on deployed features?
1 Like

Thinking more about this: no. This is bad design from a security perspective. I’d say we should remove everything where a client connects to the Smart Proxy and separate this out to a service without privileges.

The Smart Proxy has privileges to read and modify very important files. Ideally you only want to allow Foreman to talk to it. Then you can firewall it properly and limit the attack surface. Smart Proxy is your control plane and clients don’t belong there. In hindsight, adding templates and registration was a bad idea.

For those less familiar with the scope of the smart-proxy, would you expand on this notion, perhaps an example case.

All the basic functionality, such as changing DHCP, DNS records, manage Puppet certificates, access to the host’s ENC which may contain database passwords. Highly critical things and a Smart Proxy can be in full control of your network. Any mistake in authentication and you could expose those. While just a firewall is not enough, it’s about defense in depth: multiple layers. Introducing clients into this domain was a mistake because it means you can no longer firewall it. That should be rectified rather than expanded.

Perhaps there should be a Dumb Proxy. One that only has the goal of being the thing that clients talk to. It has no privileges and the only goal is to sit between Foreman and clients.

This is indeed a good point, something I have never considered myself. By the way, to this day, we have never finished SELinux policy for proxy. I started the effort, it does exist, but it’s not installed by default as too much work had to be done to finish it with some technical issues I had along the way.

Okay, I am gonna be harsh. I’d say it should use language and stack that is performance friendly, not Ruby. Let me explain. Today, I’ve spent good two hours on IRC chatting with lero about his performance issues with facts. It turned out that they had some custom facts, way too many of them, which was all uploading into Foreman and the fact endpoint was choking. The reason for that was Ruby dealing with large hash tables trying to process data as we do ton of stuff before it even goes into the database.

Even if we rewrite our fact parsing code to be more efficient, it will not help because the moment a large JSON hits Ruby on Rails endpoint, it parses it and only this operation is slow. I wish Foreman had a component running on the network edge (where smart proxy runs today) written in let’s say an approachable language with high-concurrent/high-performance HTTP stack where all of this processing could be offloaded. (*)

It’s not just facts, you can put a new bullet on your list: I am currently working on Optimized Reports Storage RFC - a prototype and a new plugin with brand new way of storing reports in more efficient way. And this week @Marek_Hulan had a great idea - what if we offload processing of incoming reports to smart proxy (or the new “dumb” smart proxy @ekohl mentioned above). Reports coming out from various sources needs to be transformed to some reasonable common format, number of warning, info, error messages must be counted and summary must be built before it can be stored into database. This all, again, can be done on the proxy side, granted for Puppet I am going to change JSON format to be more efficient without those log/resource/message hashes.

My point is, Ewould mentioned an interesting aspect and if we want to increase security, a good solution would be to break smart proxy into two separate processes and wrap everything with SELinux, specifically the client-facing process. And if we started a new process, we could intentionally build it with Performance First in mind.

(*) Thus I am thinking Golang :slight_smile:

I was hoping to start working on generic POST /facts endpoint that I could use to forward facts to Foreman. This would be helpful for automated updating of the inventory without puppet/chef/salt. I’d like to have a small client wrapper that could call facter || ohai || ufacter || sub-man --facts || whatever on the machine itself and send these to Foreman through this proxy. This would be also handy in the global registration, since we’d get much more information about the registered host. Also I think Foreman Discovery Image would benefit from that. But no concrete time plan.

Also I think foreman_scap_client belongs to this list as well as redhat_access/rh_cloud (insights agent). And templates proxy plugin perhaps.

Wouldn’t it be sufficient to run two proxies with different set of features? And can’t users do it already today? I know it’s convenient to have all features in a single proxy, but what technically prevents this today?

I think the smart proxy can be very dumb today already :slight_smile:

You can add both Facts and Reports as a new bullets even if we won’t be building a new process. I mean, even if we add the new report parsing functionality into the current smart-proxy codebase, it will be still useful because smart proxies can be scaled out.

Very true, two processes, two ports, two SELinux policies, the same codebase.

Take my Golang post with some distance, I just need sometimes to throw a rant on Ruby and ventilate how hard life on a Ruby dev can be. It makes more sense to keep on the same stack, we could resuse some code and move it from Foreman to smart-proxy etc.

One more thing which comes to my mind is one major difference between Foreman Rails on Puma and Smart Proxy on Webrick. Smart Proxy has always have been a single-process multi-thread application, Webrick by default spawns new thread per request, we haven’t heard about any issues in that regard - well there were some concurrency bugs but these were ironed out. This is a major advantage, we might not be ever able to run Foreman RoR app with multiple threads due to many dependencies.

That does not mean Smart Proxy is without scaling issues - one of the major one was that it was choking on many concurrent requests generated by security scanners. We have upgraded Ruby (thus Webrick) to a version that (hopefully) resolved the problem. But there is a lot of potential in this, either our tests reveal that Webrick is good enough or we migrate to Puma and with all the scaling options (multiple threads, multiple processes) Smart Proxy could be really good.

Also with Ruby 3.0 out, I don’t think we will ever be able to use it’s concurrency feature for Foreman RoR app, however Smart Proxy is a different story. It’s small enough and it has little dependencies so we could actually take advantage of that in some near future.

As always, we need to test and see it ourselves what it can achieve and hard hard we can push it. It is definitely worth a try.

Not a huge lot. You can pass a different settings file and run it as a different user on the same system with different ports. That could be a good middle ground. The challenge would be that you need to register 2 proxies on the same host on Foreman, both using different certificates for authentication. It’s a bit more work.

Do they have to run on the same host? If I’m considering multiple levels security, I wouldn’t put potentially unsecure (dumb proxy) process on the same host where my critical data is stored and managed (smart proxy).

Perhaps another way we need to consider thinking about is to stop looking at a proxy as a mini-monolith providing many features in one service and think about a possibility of looking at a proxy as a microservice that provides just one service.

That way we could scale out just the services that are needed by spinning up additional proxies providing them. It would also resolve/simplify the security concern of an e.g. template proxy gaining access to network config etc - as each proxy instance would have it’s own permissions (and maybe a container encapsulating it?). We could even optimize if needed by switching a specific service to a different language, as long as some basic API structure is maintained (IIRC @ekohl even did a POC once of creating a proxy in python). Perhaps in the long run this could also be a path to simplifying Foreman itself, by offloading some of the logic into these new services (e.g. template rendering, initial fact parsing etc)?
The downside of a microservices approach is it would require a bit more work to make sure all services are running and set up properly, and in some cases it would probably also need some changes in the way some of the services work internally (e.g. to enable scale-out by additional nodes).
This path could be done gradually though - for example, have most services run in one proxy, but a few that require scale/optimization/lower security would each be run as separate instances. I believe in some cases users are already running some services in a separate proxy rather than go the all-in-one route, so if we agree about it, we just need to double down on this approach as the recommended path.

1 Like

This would help a lot with running proxy in a container.

We could also get rid of smart_proxy_dynflow_core and instead deploy a rex-only smart proxy.

To aid me in thinking on this, do you envision this as a 1:1 mapping between service and port? For example, REX smart-proxy on 9000, template smart-proxy on 9001, registration smart-proxy on 9002. Or is this a microservices behind the scenes with a single web interface and single port?

Making a note that another RFC just opened plays into some of these design considerations and discussions happening there: Infrastructure roles

I think that could be an implementation detail, depending on how we want to proceed - it could be multiple containers running on one host exposing different ports, it could be multiple webservers on the same port behind a reverse-proxy with vhosts or even answering to different paths, it could be completely different machines. It might also depend on how it makes sense to scale different services.

Another thought that came from a discussion with @Marek_Hulan just now, is that maybe we can have some predefined proxy “bundles”, e.g. “provisioning proxy” that contains DNS, DHCP and TFTP, “content proxy” that only contains a content proxy with pulp preconfigured to work with it, “REX proxy” that contains dynflow (and maybe a mqtt broker in the future?) and so on.
Whether we go with a proxy per one feature or set of features, we would need to maintain some sort of stable API with foreman indicating what the proxy can do (and I think capabilities API really enhances our ability to do so).
Different features have different requirement (e.g. content requires a lot of disk space, some dhcp providers require file access for managing leases, openscap requires heavy report processing and so on) and can thus be scaled/optimized/secured differently without trying to find a “one size fits all” configuration.

I indeed wrote a PoC in Python

The commits show the steps you generally take. Then an additional blog post helps you understand the registration.

My initial goal for that was to implement the Smart Proxy registration directly in Pulp 3 as a plugin so that you can deploy content without a Ruby Smart Proxy. (This is why I think RFC: Container Gateway Smart Proxy Plugin (Container registry access for Pulp 3 Smart Proxies) is moving in the wrong dirction.)

In Smart Proxy Feature classes I tried to start a similar discussion.

As much as I like the idea of moving toward containers in a non-intrusive way by not trying to break RoR app but cherry picking features that needs to scale up, remember that containers do not contain. But you are right as long as we stick with SELinux turned on, it will be improvement.

However, one big advantage of smart-proxy is easy deployment. Linux, Windows, BSD. Small VM. Easy installation. No dependencies. This would only work if we supported also non-cluster installation. Just podman pull or docker pull and run.

Note that it doesn’t have to necessarily run inside a container - it is just one option. It could also be that for linux base oses it runs in a container and for windows/bsd it runs as a regular process. (TBH, I wonder how many users actually use proxy on a non-linux system - we might be spending a lot of effort supporting something that isn’t even used).

1 Like

I’m not sure if we can integrate with Active Directory (DNS, DHCP) if we don’t run on Windows. In the past we also had users on BSD.

However, we don’t test it so it’s not guaranteed to work. It would be nice to utilize Github Actions to test on Windows if we want to support that.

If I have not missed any option, DHCP is only possible with native_ms which requires the cli of a windows server. DNS is no problem with nsupdate with GSSAPI. I have used the later one quite often, DHCP on Active Directory never in production and I am only aware of one environment a colleague is using it.

The manual installation is off-putting on Windows, so really supporting it would be not only testing but also packaging it. Perhaps this would also be easier with something different from Ruby as this option was mentioned.

1 Like