Specific Repos "Vanish"/Become Unavailable with 404 Error after a Few Days

Problem:
I frequently encounter an issue where by random repos seem to “vanish”/lose their metadata/content from their various content views a few days after after I promote them, with Apache giving back a 404 specifically for the repo concerned instead of the normal directory structure.

[root@dcbutldevado02 ~]# dnf list updates
Updating Subscription Management repositories.
Zabbix 6.0 RH                                                                                                                                              29 kB/s | 1.5 kB     00:00
AppStream x86_64 os                                                                                                                                        1.1 kB/s |  94  B     00:00
Errors during downloading metadata for repository 'REDACTED_Alma_9_AppStream_x86_64_os':
  - Status code: 404 for https://dca-foreman.REDACTED/pulp/content/REDACTED/DCA-Pre-Prod/Alma9_CV/custom/Alma_9/AppStream_x86_64_os/repodata/repomd.xml (IP: 172.31.200.6)
Error: Failed to download metadata for repo 'REDACTED_Alma_9_AppStream_x86_64_os': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

I can resolve the issue each time by running a force republish in hammer, however I’m pretty much seeing this appear on one or multiple repos on any Content View that I leave sitting for longer than about a week (Across multiple Products, and this affects both my PreProd and Prod Foreman boxes), so it’s quite annoying to keep having to do this on a regular basis.

hammer content-view version republish-repositories --force yes --id 38

I haven’t however been able to find anything referencing similar issues on the Forum or on the Foreman bug tracker, and the last couple of monthly updates haven’t had an affect on the problem. Is anybody else aware of this issue?

Expected outcome:
Repos remain available on server.

Foreman and Proxy versions:
foreman-3.6.2-1 / katello-common-4.8.4-1

Distribution and version:
Alma 8/9.

This is ringing all kinds of bells but I couldn’t actually find the PR with @iballou that we had a related discussion on.

Since the republish metadata task is fixing the issue, the issue here is publication/distribution of the CV version repository getting deleted when it shouldn’t.

When you hit the issue on a specific repo in a CV version, could you rn some diagnostic steps for us:

  1. Grab the publication_href for the repo in question in Katello. It would look something like this in foreman-rake console. I am infering content_view_version_id and relative_path based on the logs and details in your post.
Katello::Repository.where(content_view_version_id: 38, relative_path: "REDACTED/DCA-Pre-Prod/Alma9_CV/custom/Alma_9/AppStream_x86_64_os").first.publication_href
  1. We’d also want the distribution_href stored in katello for that repo.
Katello::Pulp3::DistributionReference.where(path:"REDACTED/DCA-Pre-Prod/Alma9_CV/custom/Alma_9/AppStream_x86_64_os")
  1. Once you have the 2 hrefs, we can grep /var/log/messages for these and see if or when a DELETE call happened on those hrefs.
  2. We’d need the foreman logs from the same time to figure out what in katello actually caused the deletion calls to go out.
1 Like

Hi @sajha. Thanks very much for the reply. Sure, happy to help with gathering some debugging info, I’ll update again once I have another instance of the issue.

Checking through /var/log/messages as it stands, I can see some instances of DELETE calls running as is. These look to tie into a weekly scheduled task called “Remove Orphans.”

Oct 15 22:00:29 dcb-REDACTED pulpcore-api[1883]: pulp [945dcae4e14141ce8604d3c1b486af74]: - - [15/Oct/2023:21:00:29 +0000] “DELETE /pulp/api/v3/repositories/rpm/rpm/a9d94f33-4f58-433f-8c57-7009bf8d29fb/versions/9/ HTTP/1.1” 202 67 “-” “OpenAPI-Generator/3.19.0/ruby”
Oct 15 22:00:29 dcb-REDACTED pulpcore-api[1883]: pulp [9c58069a244a4702a86930a3c7f0c5d7]: - - [15/Oct/2023:21:00:29 +0000] “DELETE /pulp/api/v3/repositories/rpm/rpm/af05909d-8a9e-4d2b-b296-972bac5003ba/versions/11/ HTTP/1.1” 202 67 “-” “OpenAPI-Generator/3.19.0/ruby”
Oct 15 22:00:29 dcb-REDACTED pulpcore-api[1893]: pulp [8e13920e5f434fcaa88a10a2fa64440f]: - - [15/Oct/2023:21:00:29 +0000] “DELETE /pulp/api/v3/repositories/rpm/rpm/8dac4740-c1e0-470e-82fe-60b3775cfbba/versions/20/ HTTP/1.1” 202 67 “-” “OpenAPI-Generator/3.19.0/ruby”

I’m unsure offhand how directly related this is at present, but it’s feasible it could fit the timeframe of the issue occurring.

1 Like

The orphan deletion task deleting versions makes sense. There could be a problem with the way we select orphan versions. Are you also using filters on the CV?

Hi @sajha

I am using a set of Include Filters for one repo (EPEL) within my Content Views, although as per the example above the problem manifesting isn’t tied to that particular repo, it also appears against repos with no filters set.

(Otherwise I’m still currently waiting for another instance to gather the requested info).

Thanks.

Alex.