Flatpaks, smart proxies, and lifecycle management

iballou · March 10, 2025, 9:46pm

Hello community,

The MVP for OCI Flatpak support in Katello releases with 4.16: Foreman 3.14 and Katello 4.16 Release Notes

This MVP included Katello having to proxy the Flatpak index that Pulp provides to ensure that registered hosts would only see the Flatpak content in the content view environments that they are registered to. For context, the Flatpak index only serves the latest container content available in the registry. Since there is no integration between Flatpak and RHSM, Katello solved this by matching request IPs to existing hosts to determine what content view environments to serve back. In the case that we cannot find the host from the request IP, we just give the user the full Flatpak index.

While this works for the main Katello server, smart proxies are a different story entirely. For Katello 4.16, smart proxies have no lifecycle management support for Flatpaks. Only the global Pulp Flatpak index will be served.

To implement the same IP-matching feature on smart proxies, the Container Gateway (which will likely be responsible for serving the proxied Flatpak index) would need to also know which hosts have access to which repositories. It currently has a PostgreSQL DB and mirrors Foreman user data for limiting access to normal container repositories. However, host access is completely different from user access to repositories. A user with access to limited Flatpak content may be unable to access it if the container images are older than the ones listed in the Flatpak index.

<tangent>
In an ideal world, I think the Flatpak client would have some way to request a limited set of content so that the content view environments (and organizations) could be respected. If Katello & the Container Gateway could receive even a simple tag from the Flatpak client, we could filter the index based on that tag.
</tangent>

I haven’t looked too much into it yet, but modern Pulp has “domain” capabilities, which can separate served repository paths by domain names. Theoretically that could be used to limit the host’s access and Pulp could serve a different Flatpak index per domain. However, this could complicate smart proxy syncing (since domains would need to be synced separately as well), and I’m not sure how easily domains can be created/destroyed on the fly as users associate new content view environments to smart proxies.

We may have an opportunity here to improve on the IP-matching strategy in general since it isn’t 100% guaranteed to match with a host (and can be spoofed). Plus, if one organization in Katello doesn’t want another org to ever access its Flatpak index, that is impossible with today’s implementation. A user would simply need to not register with subscription-manager and it will see the latest Flatpak index hosted by Pulp. The only safeguard here would be the Foreman container user access control.

If we are able to find a different solution from IP-matching one for smart proxies, perhaps it could be implemented for the main Katello server as well.

From my perspective, the first lead to follow would be Pulp’s domain support, since it’s a Pulp-native way to separate access to content.

In the meantime, I’m curious to hear what the community thinks.

Thanks!

–Ian

jeremylenz · March 11, 2025, 3:13pm

For the yum content type, we have this solved already. It’s a beautiful convergence, the one point where Katello, Candlepin and Pulp all work together. My (limited) understanding is something like this:

A host has a certificate, provided by subscription-manager, which allows access to a list of allowed access paths. The allowed access paths correspond to the content view environment(s) (aka Candlepin environments) a host is assigned to. These paths also end up as “Repo URLs” in redhat.repo, thanks to sub-man.
When a host requests content, Pulp asks Candlepin for that certificate and checks its allowed access paths against the path of the requested content.
Pulp (is it “certguard”? or “pulp content guard”?) makes a decision on whether to serve the content based on the results of that check.

This is a quite elegant solution to content restriction, but it relies on everything working together:

The client tools (dnf and sub-man)
Pulp
Candlepin
Katello

IMO the problem we’re running into here is made worse by the fact that we’re trying to solve everything entirely within Katello. I want to at least throw out the thought of maybe getting other projects & teams involved? This way we may be able to come up with a more elegant solution rather than working around limitations.

cc @ggainey @nikos_moum

ggainey · March 11, 2025, 3:44pm

[Pedantic specifics around How It All Works in Pulp…]

“certguard” is a kind-of contentguard. A content-guard is a Thing that is used/recognized by Pulp content-app, and checks if the incoming request is Allowed, for reasons specified by the specific kind-of contentguard. We support a variety of kinds-of contentguards to let an admin control access to Distributed content.

Pulp has two kinds-of certificate-contentguards, “X509” and “RHSM”. the X509 guard looks for an incoming cert and verifies that it was signed-by the key associated with the guard. If so, the request is Allowed.

The RHSM guard is similar, but once the incoming cert is deemed valid, it also looks at the path being requested, and determines if that path is contained inside the cert - that’s how RHSM certs work. If the path is in/under the one in the cert, then you get the content. Otherwise, 403.

At no point does Pulp “know about” candlepin-the-service - it just knows-about the form of certs generated by candlepin. In the katello/foreman/pulp/candlepin dance, katello does all the talking to candlepin to generate the certs and hand them to clients. Pulp isn’t involved until the client “calls home” with such a cert.

OK, so - to the specific question - utilizing Pulp’s domains to enforce data-separation feels like a fruitful path to at least consider. It does mean that one loses the artifact-storage-deduplication that Pulp does - but that may be a small price to pay, esp when storage is backed by, say, S3, which does its own deduplication at the block-storage layer.

OR, one could expand on the RHSM-cert-approach, and provide clients with an RHSM cert that limits access to specific incoming paths? I am past the edge of what I know concerning the shape of incoming requests from clients in this context, tho, so I may be completely offbase here.

iballou · March 11, 2025, 4:20pm

The entity that makes this design particularly difficult is the Flatpak Index that wants to serve the latest and greatest Flatpaks in the registry. Pulp domains (I think) could be particularly helpful by having different Flatpak Index entities per domain. It could be a major change though since we haven’t used domains before.

The certificate idea is great for access to the container registry without username/password (required today), however it is still very new territory for container content content & flatpaks in general. And, as @ggainey mentioned, Pulp ‘only’ protects content access with a certificate, it doesn’t do any routing with it. Something still has to tell the Flatpak index to change, which isn’t possible today natively with Flatpak.

If Candlepin were to get involved, we’d still need to get the Flatpak project on board with giving clients more tailored access to the Index. Anything else that we do would be a workaround from a Flatpak standpoint.

We’re going down the road of implementing multi-tenancy / lifecycle management into Flatpak ourselves… I do agree with @jeremylenz that it’s a bit hacky to do it all from a Katello/Pulp/Candlepin standpoint.

The current options that I see are:

Continue with IP-matching and expand what the Container Gateway tracks
Explore domain support
Work with the Flatpak project to get more control over client access to the index
- An Index filter tag on content view environments could potentially work
- Even if the Index has filtering, note that the underlying container repositories may still be accessible, but with difficulty. It depends on which user is logging in on the client.

iballou · March 11, 2025, 4:30pm

One more interesting thing to keep in mind is that, while container registries have users, the Flatpak index (as far as I know) has no idea about the incoming user. It’s really meant to be this static, public entity that anyone can access. There is a dynamic endpoint for the Index, but I’m not sure if it really helps here.

jeremylenz · March 11, 2025, 5:32pm

I think this is what I was trying to say in my previous post, and just didn’t have the right words

sajha · March 17, 2025, 8:29pm

One note about domain support is AFAIU, enabling domains will be pulp-wide and not limited to container content…It would also mean all our distribution URLs will include <domain_name> for all content. I am not convinced that restricting flatpak content access is a good enough justification to enable domains. Unless we have other use-cases for enabling domains?

iballou · March 18, 2025, 9:20pm

Indeed, this would mean that all content on smart proxies would need to get redistributed. However, that URL should be transparent to users and should get update at the next check in with RHSM, at least for yum / debian content. For other content types it will cause some noisy changes that could break automation that relies on the URL layout. It would be worth checking this if domain support is investigated.

ehelms · March 20, 2025, 7:34pm

Is a Flatpak index required for the Flatkpak workflow to work?
You mentioned dynamic index, could serving it dynamically solve the what it has access to?

sajha · March 24, 2025, 8:27pm

Flatpak index is required for the workflow. The flatpak client today sends all it’s requests to static endpoint. We have this endpoint in katello which proxies request to static endpoint in pulp and then filters the pulp static index based on what the client has access to. So in a way, katello does dynamically calculate index per client although on the static endpoint.

iballou · March 25, 2025, 4:02pm

As Samir mentions above this is what we do today in Katello. The part that is a bit painful on smart proxies is that we’ll need to figure out now what hosts have access to what content, which could mean adding even more to the container gateway. Thankfully we moved from sqlite to postgres, so the performance should be better, but I wonder if we could somehow match hosts to flatpak repos without storing a host<->repository cache on the smart proxy itself. So far everything I think of is more complex than just syncing a host to repo mapping from Katello to the smart proxy.

At least, if we move to certificate authentication, we can avoid also storing a map of IP addresses on the smart proxy as well. We would just need to store consumer UUIDs.