RFC: Container Gateway Smart Proxy Plugin (Container registry access for Pulp 3 Smart Proxies)

iballou · January 4, 2021, 5:37pm

Smart Proxy Container Gateway for Katello

The Container Gateway Smart Proxy plugin is being developed to enable container registry functionality on Pulp 3-enabled smart proxies. Currently, on Smart Proxies, the Pulp 3 container registry is not properly exposed and any container image can be pulled without auth. We’re looking for feedback on our initial designs.

Current Pulp 2 model

Two registries on Katello:
:5000

Unauthenticated pull

:443

Support for Authenticated repos, featuring full Foreman authorization
Support for ‘docker search’

One registry on the Smart Proxy:

Unauthenticated pull

WIP Pulp 3 Model

Single registry with:

Support for Authenticated repos, featuring full foreman authorization
Support for ‘docker search’

Longer term, refactoring:

Basic idea: the Container Gateway exposes the Pulp 3 container registry in a usable fashion and caches auth information from Katello.

Implementation Details

Repo: GitHub - Katello/smart_proxy_container_gateway: smart proxy plugin providing an authenticated registry backed by katello, foreman, and pulp
→ Current state: unauthenticated repositories can be browsed and pulled. Next step: handing auth

API Handling

Manifests and blob GET requests redirect to Pulp 3
The _catalog endpoint GETs the container images that a user is allowed to see
The unauthenticated repository list is sent via PUT from Katello at Smart Proxy sync time
podman login requests ensure a user token is available (check cache and reach out to Katello if necessary) and presents it to the container client (todo)

Caching

Cached data exists in a PostgreSQL database
- PostgreSQL availability is guaranteed since Pulp 3 is required
ORM library: Sequel
Database migrations are automatic. The database is checked at each instantiation of the DB connection object.
Unauthenticated repo cache: list of repositories that don’t require auth
- Updated at sync time
Authentication cache: mapping of tokens checksums to users with expiration time (todo)
- Updated at login time
Authorization cache: mapping of users to accessible repositories (todo)
- Updated at login and sync time

New Apache configuration

Check the README in the GitHub repo

TODOs after initial release: packaging and installer support.

Thanks for reading, let us know if you have any questions or feedback.

ekohl · January 4, 2021, 6:24pm

I wonder why this needs to be implemented as a Smart Proxy plugin. I’m very concerned about making it in a full blown service. We have no idea how it will scale. Today it’s barely used by clients but this would change it.

I also object to Smart Proxy connecting to PostgreSQL. It’s not guaranteed to be on the same server since it’s allowed to use a remote PostgreSQL.

I also think it goes against my long term plans to scale out Pulp standalone. There I would want to run a fproxy.example.com server and a then a cluster of pulp-NNN.example.com nodes with shared storage. In your suggestion we would need to scale out fproxy.example.com as well. If we think about future deployments where Pulp itself may live in a Kubernetes / Openshift install then it’s also more complicated.

Overall I don’t like this entire architecture and design. Have you considered writing a Pulp 3 plugin to add authentication natively? This would also get rid of Katello in the chain. We would have a much better way of scaling out Pulp itself (which we need to do anyway). It also avoids more complexity of a service that needs to reach out to PostgreSQL.

Dennis_Kliban · January 4, 2021, 6:57pm

We are actively working on adding authentication and RBAC to pulp_container. It will be available in 2.3.0 release[0].

[0] https://pulp.plan.io/versions/168

Justin_Sherrill · January 4, 2021, 7:22pm

[ekohl] ekohl https://community.theforeman.org/u/ekohl Installer
January 4

I wonder why this needs to be implemented as a Smart Proxy plugin. I’m
very concerned about making it in a full blown service. We have no
idea how it will scale. Today it’s barely used by clients but this
would change it.

I know there is a long term desire to move the smart proxy to run with
puma, this was designed with that in mind. The fetching of actual
content is handled via a client redirect to pulp. The work done on the
smart proxy is purely for authentication. While i agree we’re not sure
how it will scale, we aren’t sure how any solution will really scale
until its tested in the wild. For an initial implementation, the single
threaded webbrick implementation is fine for now as this is somewhat of
a new feature. I do see in the future a puma deployment being necessary
for scaling in the future (in fact

I also object to Smart Proxy connecting to PostgreSQL. It’s not
guaranteed to be on the same server since it’s allowed to use a remote
PostgreSQL.

It doesn’t have to be on the same server. Migrations and all are
handled at runtime so the management is lower than say a rails server.
Its also somewhat ‘stateless’, a re-sync will provide everything needed.

I also think it goes against my long term plans to scale out Pulp
standalone. There I would want to run a |fproxy.example.com| server
and a then a cluster of |pulp-NNN.example.com| nodes with shared
storage. In your suggestion we would need to scale out

fproxy.example.com| as well. If we think about future deployments
where Pulp itself may live in a Kubernetes / Openshift install then
it’s also more complicated.

To be clear this is only on ‘remote smart proxies’. Its not needed on
the main server today as foreman/katello itself can fulfill this roll
with no issues. In your scenario this plugin would not be needed at
all. (we had discussed unifying these approaches and using this for the
‘main smart proxy’ as well, but its not currently planned, and wouldn’t
have to be done).

This solution would only be used in a remote environment where you need
the authenticated registry but do not want or can’t have clients
connecting back directly to the foreman server.

Overall I don’t like this entire architecture and design. Have you
considered writing a Pulp 3 plugin to add authentication natively?
This would also get rid of Katello in the chain. We would have a much
better way of scaling out Pulp itself (which we need to do anyway). It
also avoids more complexity of a service that needs to reach out to
PostgreSQL.

pulp3 has its own auth mechanism which differs quite different from
foreman’s. To be frank, the way that auth is search based in foreman
makes it difficult for anything to integrate in without either being a)
completely tailored to foreman or b) require a request back to foreman
for any particular action. One of the goals for this was that the
foreman server only have to be up for the initial ‘podman login’, and
after that it can be down and only the remote smart proxy & pulp server
be needed.

The current pulp3 auth system requires a permission to be made for each
object and user. For this to work in the way you suggest would mean
that katello would need to pre-create the permissions for every user to
every repository ahead of time. On a system with many users and many
repositories this could potentially be massive. This solution fetches
the authorization information at run time when the user logs in.

ehelms · January 7, 2021, 2:59pm

Do I understand correctly that this is designed to have clients, for auth, connect to the smart proxy and let the smart proxy do the communication back to Foreman to determine if the client has access? If yes, cause I think I am missing this nuance, what is the use case being solved that is not solved by proxying calls to Foreman such as what is done with subscription-manager or GPG key fetching today?

Can you expand on what use case the caching is intended to target? Is this just an optimization to reduce traffic back to Foreman?

iballou · January 7, 2021, 3:19pm

When clients first connect to the smart proxy, the smart proxy will connect to Foreman to get the auth information. The caching will help reduce the amount of times that the smart proxy needs to connect to Foreman. For container repos that do not need authentication, the smart proxy will not need to talk to Foreman at all after syncing since that information is cached as well.

@Justin_Sherrill correct me if I’m missing another use case, but yes the main use case for the caching is to reduce traffic back to Foreman.

ehelms · January 7, 2021, 3:30pm

Is it fair to say then that the caching is an optimization, but not a requirement for an initial implementation?

Does the client connect and the smart proxy perform auth:

every login
every ‘pull’
every other

I am hoping to get a feel for, how much traffic back to Foreman we are talking about for container client use cases.

iballou · January 7, 2021, 4:23pm

I’d say so, yeah. As an aside, the framework for caching with Sequel + PostgreSQL is already done.

Every login: yes
Every pull: this seems to be the case. I thought Docker/Podman would cache the token, but it seems like it wants to get a token from the server at every pull
Others:

search (/v1/search or /v2/_catalog): no (search will just show the repos you have access to)

I think one nice benefit of caching besides reduced traffic is the fact that repos with “unauthenticated pull” could be pulled from the smart proxy without ever talking to Foreman. This is due to the fact that we store the “unauthenticated repo list”.

iballou · January 11, 2021, 2:36pm

I realized it might not have been clear that my reply above was relating to what would happen if we had no caching at all with the container gateway. With caching, the smart proxy will only need to reach out to Foreman for auth once per user login. The cached token can be used for subsequent communications.

As such, I think the amount of reduced traffic back to Foreman with caching would be pretty great.

Justin_Sherrill · January 11, 2021, 2:43pm

[iballou] iballou https://community.theforeman.org/u/iballou Katello
January 7
ehelms:

Do I understand correctly that this is designed to have clients,
for auth, connect to the smart proxy and let the smart proxy do
the communication back to Foreman to determine if the client has
access?
When clients first connect to the smart proxy, the smart proxy will
connect to Foreman to get the auth information. The caching will help
reduce the amount of times that the smart proxy needs to connect to
Foreman. For container repos that do not need authentication, the
smart proxy will not need to talk to Foreman at all after syncing
since that information is cached as well.
ehelms:

Can you expand on what use case the caching is intended to target?
Is this just an optimization to reduce traffic back to Foreman?
@Justin_Sherrill https://community.theforeman.org/u/justin_sherrill
correct me if I’m missing another use case, but yes the main use case
for the caching is to reduce traffic back to Foreman.

And to allow content to be fetched when the main foreman server is down
(or under maintenance) if either 1) the content is unprotected 2) the
user already has a valid token (has already podman login’d)

ehelms · January 11, 2021, 4:51pm

To put that on the level with other content (double check me please):

RPM content can be accessed if Foreman is down (as long as the client already has the GPG key?)
File content can be accessed if Foreman is down
Deb content can be accessed if Foreman is down

Justin_Sherrill · January 11, 2021, 9:12pm

[ehelms] ehelms https://community.theforeman.org/u/ehelms Leader
January 11
Justin_Sherrill:

And to allow content to be fetched when the main foreman server is
down
(or under maintenance) if either 1) the content is unprotected 2) the
user already has a valid token (has already podman login’d)
To put that on the level with other content (double check me please):

RPM content can be accessed if Foreman is down (as long as the
client already has the GPG key?)

File content can be accessed if Foreman is down

Deb content can be accessed if Foreman is down

Correct

ekohl · January 15, 2021, 12:39pm

I still don’t understand why this can’t be a real Pulp plugin. As part of the Pulp plugin you have a full database at your disposal. It would be generic and scale as part of Pulp. Katello can make sure Pulp can always serve the authenticated request. This would also remove the differentiation between a “local proxy” and a “remote proxy”. That’s a Katello thing and that concept must go away IMHO.

While it’s true PostgreSQL doesn’t have to be on the same server, it does add a lot of complexity to the instructions. We already have 3 places where we need to specify PostgreSQL details, this would make it 4.

Overall I’m strongly leaning to a NACK on the general design. Of course anyone is free to write any plugin they want, but I don’t want to carry this in the installer.

Justin_Sherrill · January 15, 2021, 8:34pm

It would be possible to make it into a pulp plugin, however it would:

most likely need to be katello specific to cover the specific caching requirements we care about
would likely not be supported by the pulp team, so it would land on our team to support a django plugin within a python application. this solution would be easier to maintain across pulp releases IMO

Our community maintaining a pulp plugin is about the least desirable thing IMO

I’m sorry, but this concept isn’t going away. The ‘local proxy’ has content directly managed by katello, while a remote proxy syncs content from the local proxy. That is by design how it works, if you’d like to propose something that can be feasibly done to change that i am more than welcome for a dialogue around that topic. We’ve worked over the years to reduce the differences and our longer term goal is to reduce it further, but i’m not sure how that concept ever goes away. I’m not sure how it goes away and keep the desired feature set (a mirror of content from the main katello server). You could argue that some environments don’t need a mirror, but there’s nothing that prevents you from deploying a smart proxy without a content mirror.

I somewhat understand the hesitance to wanting to add another database connection. For now we could use a simple file based cache approach, since the smart proxy doesn’t support threading/multi-processing and work towards a more amicable solution in the meantime. If we just focus on unauthenticated repos, it would even work with a multi-process/thread deployment. We can work with the pulp team to figure out if some set of functionality they maintain would be possible. Thoughts on that?

ekohl · January 18, 2021, 3:06pm

I moved this to the RFC section in Discourse.

iballou · January 21, 2021, 5:28pm

@ekohl, @ehelms, @Justin_Sherrill, and I had a meeting today to discuss how to move forward with the Container Gateway. Here are the notes from the meeting:

Container Gateway meeting notes

Main concerns discussed:

It is not fully agreed-upon that the Smart Proxy should have clients interacting with it directly.
The Smart Proxy Container Gateway deviates from the idea that the Smart Proxy is a control surface for making changes, not for interacting with services. If the Smart Proxy goes down, container content cannot be consumed.
The PostgreSQL database adds extra complexity that is not necessary while the smart proxy is still single-threaded.

Main counter-arguments:

Without the Container Gateway, users would have to forgo upgrading their Smart Proxies when they upgrade to Katello 4.0. Delaying the release of the Container Gateway would cause continuity issues for the Katello 4.0 upgrade as such.
The database can be easily changed to SQLite to alleviate PostgreSQL concerns.

Resolution:

The Container Gateway Smart Proxy plugin will be used with the changes below for Katello 4.0. It will support pulling unauthenticated repositories only for now. This is okay for Katello 4.0 since we are on a tight timeline, but the concerns should be resolved by Katello 4.1. Addressing these concerns will involve investigating whether or not pulp-container can handle the needs of the Container Gateway itself.

Immediately-required changes:

Change database to SQLite

aruzicka · January 21, 2021, 8:32pm

Will it be on-disk or in-memory sqlite? I don’t know anything about how the plugin will use the db, but we had a lot of trouble with on-disk sqlite with dynflow+rex on the proxy. Just a thought so you won’t shoot yourselves in the foot

iballou · January 21, 2021, 8:34pm

It’s on-disk. What sort of issues did you have?

aruzicka · January 22, 2021, 8:50am

Dynflow runs a couple of threads which simultaneously accessed the db and every now and then the db would lock up and all attempts to interact with it would raise an exception until the process died.

iballou · January 22, 2021, 2:38pm

Gotcha. Our requirement for using SQLite was that there would be a single thread, so we should be good.