RFC: Transitioning Katello to "structured APT (deb) content"

What is “structured APT content”?

Unlike with RPM repos, APT repos can be subdivided into multiple distributions (a.k.a. “releases”), which can each be further subdivided into multiple components (and may also contain .deb packages for multiple architectures). For example: The Debian mirror found at http://ftp.de.debian.org/debian/ has the distributions bookworm, bullseye, buster, bullseye-updates, and many more besides it. These distributions generally have the components main, contrib, and non-free in them.

When you sync APT content (deb type repository) into Katello, you must provide not only the “Upstream URL” (e.g. http://ftp.de.debian.org/debian/), but also at least one “Release/distribution” (e.g.: buster), and may restrict the “Components” you want to sync as well. (Leaving the “Components” field blank implies syncing all available components, similar for “Architectures”).

When you then look at the APT repository as published by Katello, you will again find all the distributions and components you have synced, however, you will also find one additional distribution named “default”, containing a single component named “all”. As the naming suggests, this extra distribution contains all the Packages from all the other “structured” distribution-component combinations in the repo within a single “catch all” component.

If you use subscription-manager for Debian or Ubuntu in order to register your APT content hosts with Katello (e.g. by using one of the clients from http://oss.atix.de/), then subscription-manager will always consume the “default/all” distribution-component combination, and will simply ignore the structured APT components.

The reason, is that Katello will only provide subscription-manager with the Katello repository publication URL for each repo, so subscription-manager has no way of knowing what structured distributions and components there might be.

Why change this now?

The above approach has various disadvantages and limitations:

  • It is hard for users to understand what the “default/all” simple publication is all about, since they are used to the APT repo structure from the upstream repos they sync.
  • It is also hard for users to understand that the structured components they are used to are present within Katello, but are ultimately unused by subscription-manager and the clients.
  • Publishing both “default/all” and structured components is a significant overhead, that increases APT repo sync times by roughly a third.
  • Many client side features like “APT pinning” or the “apt list --upgradable” command, expect structured APT components, and will not work as designed with “default/all”.

Go ahead and change it then?

There are several parts to the puzzle of using exclusively strcutured APT components within Katello, and all of them need to be in place for the story to work. As a result, we plan to roll out this change over several Katello releases. The first release, will include an option to enable structured APT for testing. Once we are confident everything is in place, using structured APT will become the default, and eventually the old “simple publish” will be dropped alltogether. The feature will require all APT hosts to use the latest version of subscription-manager from http://oss.atix.de/, so the above step wise transition will provide plenty of time to ensure this is the case.

This RFC is your chance to learn of these plans early, and weigh in with any concerns.

What is the current state?

Remaining challenges and limitations

  • There are still situations in the current implementation that will fall back to the old “default/all” behaviour. Until this is fixed, we cannot disable simple publishing to reap the performance rewards of an all structured APT world.
  • It is a rare but not unheard of use case, to change the list of components being synced on the repository page. Currently, doing so, will immediately update the list of components on all registered hosts, regardless of whether the change has actually propagated to the LCENV used yet. Fixing this requires candlepin to associate different “cp2_content” with different LCENVs. However, Katello is currently not set up to do this, since Katello associates a single content path with the root repository and not one for each repository as used for a single LCENV.

I think the plan overall is sounding good to me.

As for the last challenge you mention about having different cp2_content paths for the different components, I agree that that would cause some stir in Katello. Plus, I don’t think it’s unreasonable to expect users to want to change components of their repositories after creation and after being published in a content view.

I’d be curious to hear from @jeremylenz or @cintrix84 about how it would sound to have content be a per-repository rather than per-root-repository construct.

Is there some way the components could be communicated to sub-man besides via the cp2_content?

Or, if this really does make things complicated… how bad for users would it be for releases/distributions/components to be read-only after repository creation for deb?

1 Like

I guess this could be difficult without significant changes to candlepin, but if there is a way that would be very interesting.

Our current structure in Rails is

  • Product has_many RootRepositories
  • Product has_many Contents (through the ProductContent join table)
  • Product has_many Repositories through RootRepositories

So Katello::Product is the common parent.

We should probably keep our relationship between Product and Content the same, since it mirrors the structure of Candlepin. As for repositories, we have a bit more flexibility but I’m not sure I’m getting my head around what the suggested change would be to our DB structure?