What is “structured APT content”?
Unlike with RPM repos, APT repos can be subdivided into multiple distributions (a.k.a. “releases”), which can each be further subdivided into multiple components (and may also contain .deb packages for multiple architectures). For example: The Debian mirror found at http://ftp.de.debian.org/debian/
has the distributions bookworm
, bullseye
, buster
, bullseye-updates
, and many more besides it. These distributions generally have the components main
, contrib
, and non-free
in them.
When you sync APT content (deb type repository) into Katello, you must provide not only the “Upstream URL” (e.g. http://ftp.de.debian.org/debian/
), but also at least one “Release/distribution” (e.g.: buster
), and may restrict the “Components” you want to sync as well. (Leaving the “Components” field blank implies syncing all available components, similar for “Architectures”).
When you then look at the APT repository as published by Katello, you will again find all the distributions and components you have synced, however, you will also find one additional distribution named “default”, containing a single component named “all”. As the naming suggests, this extra distribution contains all the Packages from all the other “structured” distribution-component combinations in the repo within a single “catch all” component.
If you use subscription-manager for Debian or Ubuntu in order to register your APT content hosts with Katello (e.g. by using one of the clients from http://oss.atix.de/), then subscription-manager will always consume the “default/all” distribution-component combination, and will simply ignore the structured APT components.
The reason, is that Katello will only provide subscription-manager with the Katello repository publication URL for each repo, so subscription-manager has no way of knowing what structured distributions and components there might be.
Why change this now?
The above approach has various disadvantages and limitations:
- It is hard for users to understand what the “default/all” simple publication is all about, since they are used to the APT repo structure from the upstream repos they sync.
- It is also hard for users to understand that the structured components they are used to are present within Katello, but are ultimately unused by subscription-manager and the clients.
- Publishing both “default/all” and structured components is a significant overhead, that increases APT repo sync times by roughly a third.
- Many client side features like “APT pinning” or the “apt list --upgradable” command, expect structured APT components, and will not work as designed with “default/all”.
Go ahead and change it then?
There are several parts to the puzzle of using exclusively strcutured APT components within Katello, and all of them need to be in place for the story to work. As a result, we plan to roll out this change over several Katello releases. The first release, will include an option to enable structured APT for testing. Once we are confident everything is in place, using structured APT will become the default, and eventually the old “simple publish” will be dropped alltogether. The feature will require all APT hosts to use the latest version of subscription-manager from http://oss.atix.de/, so the above step wise transition will provide plenty of time to ensure this is the case.
This RFC is your chance to learn of these plans early, and weigh in with any concerns.
What is the current state?
- The main tracker for the story is currently: Feature #35959: Replace simple publisher with structured publisher for Debian Repositories - Katello - Foreman
- Using structured APT in production will require pulp_deb >= 3.0.0, which has not yet been released, but is planned for inclusion in Katello 3.10.
- The feature can work with current versions of pulp_deb, but can be affected by a data inconsistency in pulp_deb that is only fixed with 3.0.
- Using structured APT requires a change to subscription-manager to work.
- https://github.com/candlepin/subscription-manager/pull/3223
- This patch has already been added to the client repos at http://oss.atix.de/
- The initial implementation of the feature can be found here: Fixes #35959 - prepare to use structured publisher for deb content by sbernhard · Pull Request #10420 · Katello/katello · GitHub
- Includes a rake task to enable/disable the future using:
foreman-rake katello:enable_structured_content_for_deb[true]
- For repositories with an empty “Components” field, the rake task will set the field with the right values.
- Repostirories with an empty “Components” field will default back to using “default/all” even with structured APT enabled.
- Once structured APT is enabled, running
subscription-manager repos
on a registered host will update the repo configuration in/etc/apt/sources.list.d/rhsm.sources
to use structured APT.
- Includes a rake task to enable/disable the future using:
- We will eventually also need: Add structure upload for deb repositories by Manisha15 · Pull Request #10639 · Katello/katello · GitHub
- This is needed so that individual package uploads will eventually work with structured APT as well.
- Currently, and even with this change, package uploads are still dependent on fall back to “default/all” behaviour.
- This change can be reviewed and added to Katello independently of the rest of the story (requires pulp_deb >= 2.21.1)
Remaining challenges and limitations
- There are still situations in the current implementation that will fall back to the old “default/all” behaviour. Until this is fixed, we cannot disable simple publishing to reap the performance rewards of an all structured APT world.
- It is a rare but not unheard of use case, to change the list of components being synced on the repository page. Currently, doing so, will immediately update the list of components on all registered hosts, regardless of whether the change has actually propagated to the LCENV used yet. Fixing this requires candlepin to associate different “cp2_content” with different LCENVs. However, Katello is currently not set up to do this, since Katello associates a single content path with the root repository and not one for each repository as used for a single LCENV.