How We Think About and Use Smart Proxies Architecturally

Agree to different machine, but I dont see abstracting pulp out of katello. I think it wil know it is talking to a pulp backend. It can go async, rest, anything else.

[stbenjam] stbenjam https://community.theforeman.org/u/stbenjam
July 2

Bryan_Kearney:

There is a distinction here tho. DNS/DHCP is an interface with
different implementations behind it. Pulp is not a interface, it
is a sepcific service.

An abstraction isnā€™t necessary to get the benefits of a single auth
scheme, single port, and single API endpoint.

Is the suggestion here that we use the smart proxy service to proxy
calls? I would think the throughput we put through to a pulp server is
too much for a single process/single threaded service. If that is the
goal, we may need to expand out the infrastructure related to the smart
proxy server.

There are some aspects in the code we have now. In the current proposed PR we would be starting to use the Proxy as service discovery: the proxy knows the URL to Pulp and Katello learns the Pulp URL from the Proxy when it registers / updates its features.

I suppose itā€™s a good thing that the Proxy isnā€™t single threaded then :slight_smile:

Ah good, my memory must have been out of date ! :slight_smile:

If we were to apply this to the Pulp thought model, would a user need a smart proxy on the LDAP system given the Foreman server today does not need to be there? Youā€™d just need a smart proxy somewhere (could even be the default one) that is linked and talking to the LDAP system?

This is less true with remote database support right? And Pulp falls into a similar model in some respects. One master with many replicas. I think we look at Pulp child instances, and scale out a smart proxy makes sense.

If the different metadata issue were solved, would it change the approach?

Yes, these are other facets of what the smart proxy is today. I was not intending to discount them. I should have clarified that I was somewhat trying to target 2 and 4 along with stack requirements. And I am trying to get at if we should be treating server from child deployments differently with Pulp or other services. Let me try to summarize the core questions weā€™ve got to:

  1. Should a server deployment (Foreman core or with plugins) require a smart proxy to essentially function?
  2. Should things like Pulp communication be unified through a Smart Proxy and thus require itā€™s existence?

This site might help ā€“ https://www.newegg.com/Memory/Category/ID-17

One difference is who orchestrates the replicas. With remote databases there is one database URL. As I understand it, with Pulp we orchestrate the replicas.

Clients in foreign networks talk to pulps that are local to them, just like the local TFTP, DHCP, and DNS servers. Thatā€™s been the Foreman model since very early on. But, we have a central pulp and it is special. It is more like the database in that regard, but it doesnā€™t really have to be architected that way.

I could also imagine a different design where all of the content actions happen through the smart proxy, and Katelloā€™s issuing much simpler orchestration calls like create content view, publish, sync repo, etc and letting the proxies handle it all. I think in that model you could end up not being as monolithic, by potentially have multiple central pulps with different sets of content so it acts more like all of the other distributed services. Some of the intensive tasks like errata calculation could be more distributed, as well. Iā€™m sure thereā€™s all kinds of pitfalls in such a design, and I donā€™t even know if itā€™s a good idea.

Anyway, the proposed PR where we get the Pulp URL from the smart proxy seems fine to me, and Iā€™m happy to admit Katello is special and live with how it is.

I donā€™t think itā€™d change how I think about smart proxies, but itā€™d solve the problem of scaling proxies with content.

The question is how the user/admin figures out the service is missing in action. There can be several reasons why the service is not up: perhaps it has not just been installed, or upgraded to required version, or itā€™s just down: and for this, itā€™s good to have the service catalogue, and the foreman-proxy register is actually used as one.

There are also situations, when we need the ext service not just for actions, but also for presentation, where the ā€˜fire and forgetā€™ it not enough.

Therefore, the explicit assignment of resources to expected services helps dealing with UX, when the services we rely on are not there.

Well: you canā€™t do the remote command themselves. On the other hand, you can have just Ssh proxy, and you can do Ssh, but not Ansible, and vice-verse. But most importantly, you can install the remote execution plugin (just on server side), and you get all the UI working without ugly 500s, and once you setup everything and actually try to execute the job, you will be informed that there is no usable proxy to perform the job.

I think thatā€™s a good model for all plugins that need some external service for its full functionality. I can imagine Katello working similar way, where you could enable just Katello and do the basic operation, while not allowing doing anything else until plugin in additional proxy (such as Pulp one).

The important property of this model is also, that it countā€™s on the fact, that the external services are not present from the day one and donā€™t have to be present 24/7, but the main server should still work in terms of not looking like broken.

Letā€™s think about Organization: today, you canā€™t create one and sync it to Canldpin later, once you install one. I can imagine, that instead, once you install Katello, it would indicate on the Organization, that it has not been fully synced, and suggesting to sync it to corrsponding Candlepin service (represented by proxy), perhaps letting the user know that they canā€™t use subscriptions functionality, until the Candlepin is around: on the other hand, they should still be able to install just Pulp and use itā€™s functionality to work with content: the clients would still be able to consume the content.