Katello deb repo out of sync with upstream

Problem:

A recent build failed the FreeIPA enrollment, digging into why, I found that the apt package table was listing a package that had been removed from the upstream Ubuntu repo. Hence the freeipa-client install failed.

The local Ubuntu product repo is configured as ‘on demand’. Looking at the logs I found the host was trying to download a removed package. It must have been recently updated on the upstream, so I figured a sync might be in order. However, nothing I’ve tried has worked yet.

  • Advanced sync >> complete sync
  • Verify Content Checksum
  • Republish Repo Metadata (still running)

I was hoping to avoid a full sync of the Ubuntu repos (to save some disk space), but I’m starting to think this may not be a good idea for deb repos?

Expected outcome:

‘On demand’ repo works, or will work again after a sync.

Foreman and Proxy versions:

v3.13.0

Foreman and Proxy plugin versions:

Foreman: 3.13.0
Katello: 4.15.0

Distribution and version:

Foreman: CentOS Stream 9
Client/host: Ubuntu 24.04

Other relevant data:
Pulpcore-content logs:

Feb 20 16:39:34 fm01.some.domain.com pulpcore-content[3050261]: pulp [None]: backoff:ERROR: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url='http://archive.ubuntu.com/ubuntu/pool/main/v/vim/xxd_9.1.0016-1ubuntu7.5_amd64.deb')
Feb 20 16:39:34 fm01.some.domain.com pulpcore-content[3050261]: pulp [None]: pulpcore.content.handler:WARNING: Could not download remote artifact at 'http://archive.ubuntu.com/ubuntu/pool/main/v/vim/xxd_9.1.0016-1ubuntu7.5_amd64.deb': 404, message='Not Found', url='http://archive.ubuntu.com/ubuntu/pool/main/v/vim/xxd_9.1.0016-1ubuntu7.5_amd64.deb'
Feb 20 16:39:34 fm01.some.domain.com pulpcore-content[3050261]: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url='http://archive.ubuntu.com/ubuntu/pool/main/v/vim/xxd_9.1.0016-1ubuntu7.5_amd64.deb')

Hi @dmgeurts ,

It sounds like Pulp might be hanging on to remote artifacts that no longer exist. @quba42 who works on the pulp-deb plugin might know about a bug in this area?

In any case, I think an issue here )GitHub · Where software is built) might be helpful.

I did talk to a user who was experiencing the same issue recently, but only for very large amounts of Debian content.

Thank you; Pulp offers outdated content · Issue #1240 · pulp/pulp_deb · GitHub Raised.

On demand works by downloading the package whenever it is first requested by a host. If that package no longer exists on the remote repo at that time, it cannot be downloaded. For this reason, on demand is not a good fit for upstream repositories that are in the habit of dropping old packages. This is true of the official Ubuntu and Debian repositories.

Once Pulp references packages that no longer exist in the upstream repo, all you can do is a re-sync (using Mirroring Policy: Content Only). A regular re-sync must suffice (no complete sync or additional re-publish repository metadata actions required). Of course that re-sync must be promoted into any content views and lifecycle environments actually used by your hosts. Any old content view versions with the old state in it are broken.

Again: Using on demand for repos that do not retain the packages does not really make much sense. None of this is a bug. It is just the expected design limitations of the on demand feature.

Now, if re-syncing the repo (using Mirroring Policy: Content Only) does not lead to a initially working repository version, then there may be a bug involved. Is this the case here?

1 Like

This is the way I was reading the issue. It sounded like the remote artifacts that should have been removed due to the upstream deleting them were sticking around and a resync was not fixing it.

A resync defo did not fix the issue. Pulp did not update the package list, so would continue to offer an outdated package to the host.

If a host doesn’t need access to superseded packages then shouldn’t on-demand work fine for deb repos? The only problem case I can see is if the upstream repo is updated and pulp hasn’t synced yet, but a manual sync should resolve this issue. Hence I raised my question here.

I mean yes, if you are willing to work around the known limitations of on demand (mainly by re-syncing whenever needed), then you can use on demand. The reason why I would not recommend this to most users is that it does clash with many core Katello features that most users expect to work. For example: the idea of a content view version that freezes a particular working package state, so you can always roll back your hosts to that state if needed becomes pretty meaningless/unusable if the packages in that state are no longer available. If you don’t care about that, and on demand works for your use case, feel free to use it.