Flatpaks, smart proxies, and lifecycle management

Hello community,

The MVP for OCI Flatpak support in Katello releases with 4.16: Foreman 3.14 and Katello 4.16 Release Notes

This MVP included Katello having to proxy the Flatpak index that Pulp provides to ensure that registered hosts would only see the Flatpak content in the content view environments that they are registered to. For context, the Flatpak index only serves the latest container content available in the registry. Since there is no integration between Flatpak and RHSM, Katello solved this by matching request IPs to existing hosts to determine what content view environments to serve back. In the case that we cannot find the host from the request IP, we just give the user the full Flatpak index.

While this works for the main Katello server, smart proxies are a different story entirely. For Katello 4.16, smart proxies have no lifecycle management support for Flatpaks. Only the global Pulp Flatpak index will be served.

To implement the same IP-matching feature on smart proxies, the Container Gateway (which will likely be responsible for serving the proxied Flatpak index) would need to also know which hosts have access to which repositories. It currently has a PostgreSQL DB and mirrors Foreman user data for limiting access to normal container repositories. However, host access is completely different from user access to repositories. A user with access to limited Flatpak content may be unable to access it if the container images are older than the ones listed in the Flatpak index.

<tangent>
In an ideal world, I think the Flatpak client would have some way to request a limited set of content so that the content view environments (and organizations) could be respected. If Katello & the Container Gateway could receive even a simple tag from the Flatpak client, we could filter the index based on that tag.
</tangent>

I havenā€™t looked too much into it yet, but modern Pulp has ā€œdomainā€ capabilities, which can separate served repository paths by domain names. Theoretically that could be used to limit the hostā€™s access and Pulp could serve a different Flatpak index per domain. However, this could complicate smart proxy syncing (since domains would need to be synced separately as well), and Iā€™m not sure how easily domains can be created/destroyed on the fly as users associate new content view environments to smart proxies.


We may have an opportunity here to improve on the IP-matching strategy in general since it isnā€™t 100% guaranteed to match with a host (and can be spoofed). Plus, if one organization in Katello doesnā€™t want another org to ever access its Flatpak index, that is impossible with todayā€™s implementation. A user would simply need to not register with subscription-manager and it will see the latest Flatpak index hosted by Pulp. The only safeguard here would be the Foreman container user access control.

If we are able to find a different solution from IP-matching one for smart proxies, perhaps it could be implemented for the main Katello server as well.

From my perspective, the first lead to follow would be Pulpā€™s domain support, since itā€™s a Pulp-native way to separate access to content.

In the meantime, Iā€™m curious to hear what the community thinks.

Thanks!

ā€“Ian

3 Likes

For the yum content type, we have this solved already. Itā€™s a beautiful convergence, the one point where Katello, Candlepin and Pulp all work together. My (limited) understanding is something like this:

  1. A host has a certificate, provided by subscription-manager, which allows access to a list of allowed access paths. The allowed access paths correspond to the content view environment(s) (aka Candlepin environments) a host is assigned to. These paths also end up as ā€œRepo URLsā€ in redhat.repo, thanks to sub-man.
  2. When a host requests content, Pulp asks Candlepin for that certificate and checks its allowed access paths against the path of the requested content.
  3. Pulp (is it ā€œcertguardā€? or ā€œpulp content guardā€?) makes a decision on whether to serve the content based on the results of that check.

This is a quite elegant solution to content restriction, but it relies on everything working together:

  1. The client tools (dnf and sub-man)
  2. Pulp
  3. Candlepin
  4. Katello

IMO the problem weā€™re running into here is made worse by the fact that weā€™re trying to solve everything entirely within Katello. I want to at least throw out the thought of maybe getting other projects & teams involved? This way we may be able to come up with a more elegant solution rather than working around limitations.

cc @ggainey @nikos_moum

2 Likes

[Pedantic specifics around How It All Works in Pulpā€¦]

ā€œcertguardā€ is a kind-of contentguard. A content-guard is a Thing that is used/recognized by Pulp content-app, and checks if the incoming request is Allowed, for reasons specified by the specific kind-of contentguard. We support a variety of kinds-of contentguards to let an admin control access to Distributed content.

Pulp has two kinds-of certificate-contentguards, ā€œX509ā€ and ā€œRHSMā€. the X509 guard looks for an incoming cert and verifies that it was signed-by the key associated with the guard. If so, the request is Allowed.

The RHSM guard is similar, but once the incoming cert is deemed valid, it also looks at the path being requested, and determines if that path is contained inside the cert - thatā€™s how RHSM certs work. If the path is in/under the one in the cert, then you get the content. Otherwise, 403.

At no point does Pulp ā€œknow aboutā€ candlepin-the-service - it just knows-about the form of certs generated by candlepin. In the katello/foreman/pulp/candlepin dance, katello does all the talking to candlepin to generate the certs and hand them to clients. Pulp isnā€™t involved until the client ā€œcalls homeā€ with such a cert.

OK, so - to the specific question - utilizing Pulpā€™s domains to enforce data-separation feels like a fruitful path to at least consider. It does mean that one loses the artifact-storage-deduplication that Pulp does - but that may be a small price to pay, esp when storage is backed by, say, S3, which does its own deduplication at the block-storage layer.

OR, one could expand on the RHSM-cert-approach, and provide clients with an RHSM cert that limits access to specific incoming paths? I am past the edge of what I know concerning the shape of incoming requests from clients in this context, tho, so I may be completely offbase here.

3 Likes

The entity that makes this design particularly difficult is the Flatpak Index that wants to serve the latest and greatest Flatpaks in the registry. Pulp domains (I think) could be particularly helpful by having different Flatpak Index entities per domain. It could be a major change though since we havenā€™t used domains before.

The certificate idea is great for access to the container registry without username/password (required today), however it is still very new territory for container content content & flatpaks in general. And, as @ggainey mentioned, Pulp ā€˜onlyā€™ protects content access with a certificate, it doesnā€™t do any routing with it. Something still has to tell the Flatpak index to change, which isnā€™t possible today natively with Flatpak.

If Candlepin were to get involved, weā€™d still need to get the Flatpak project on board with giving clients more tailored access to the Index. Anything else that we do would be a workaround from a Flatpak standpoint.

Weā€™re going down the road of implementing multi-tenancy / lifecycle management into Flatpak ourselvesā€¦ I do agree with @jeremylenz that itā€™s a bit hacky to do it all from a Katello/Pulp/Candlepin standpoint.

The current options that I see are:

  1. Continue with IP-matching and expand what the Container Gateway tracks
  2. Explore domain support
  3. Work with the Flatpak project to get more control over client access to the index
    • An Index filter tag on content view environments could potentially work
    • Even if the Index has filtering, note that the underlying container repositories may still be accessible, but with difficulty. It depends on which user is logging in on the client.
2 Likes

One more interesting thing to keep in mind is that, while container registries have users, the Flatpak index (as far as I know) has no idea about the incoming user. Itā€™s really meant to be this static, public entity that anyone can access. There is a dynamic endpoint for the Index, but Iā€™m not sure if it really helps here.

1 Like

I think this is what I was trying to say in my previous post, and just didnā€™t have the right words :slight_smile:

1 Like

One note about domain support is AFAIU, enabling domains will be pulp-wide and not limited to container contentā€¦It would also mean all our distribution URLs will include <domain_name> for all content. I am not convinced that restricting flatpak content access is a good enough justification to enable domains. Unless we have other use-cases for enabling domains?

2 Likes

Indeed, this would mean that all content on smart proxies would need to get redistributed. However, that URL should be transparent to users and should get update at the next check in with RHSM, at least for yum / debian content. For other content types it will cause some noisy changes that could break automation that relies on the URL layout. It would be worth checking this if domain support is investigated.

Is a Flatpak index required for the Flatkpak workflow to work?
You mentioned dynamic index, could serving it dynamically solve the what it has access to?