DEB repository sync fails due to uniqueness

Problem:
I’m trying to use Foreman+Katello for our infrastructure rework but I’m running into issues with integrating the XWiki repository (Index of /stable) in there.
Katello errors with “Cannot create repository version since there are newly added packages with the same name, version, and architecture, but a different checksum.”
This led me to check the https://maven.xwiki.org/stable/Packages file for duplicates on (Package,Version,Architecture) and indeed there are some:

      2 xwiki-tomcat8-mysql;11.10.6;all
      2 xwiki-tomcat8-mysql;11.10.7;all
      2 xwiki-tomcat8-mysql;11.10.8;all
      2 xwiki-tomcat8-mysql;12.6;all
      2 xwiki-tomcat9-mysql;11.10.6;all
      2 xwiki-tomcat9-mysql;11.10.7;all
      2 xwiki-tomcat9-mysql;11.10.8;all
      2 xwiki-tomcat9-mysql;12.6;all

Checking them in the Packages file I can see that indeed they are the same Package but with different hashes and filenames (related to the fact how some distributions manage mysql vs mariadb).

Package: xwiki-tomcat9-mysql
Version: 11.10.6
Architecture: all
...
Filename: releases/org/xwiki/platform/xwiki-platform-distribution-debian-tomcat9-mysql/11.10.6/xwiki-platform-distribution-debian-tomcat9-mysql-11.10.6.deb
Size: 1468
MD5sum: a4aec74b9878dada2a644a7ebe78545e
SHA1: 4b1389b63c9cd5ebbd0e00382291d42982c88029
SHA256: e6097b4f276e04b04d2d436351c9523416d5eaeadb18913f92cd9685414824c9
...
Package: xwiki-tomcat9-mysql
Version: 11.10.6
Architecture: all
...
Filename: releases/org/xwiki/platform/xwiki-platform-distribution-debian-tomcat9-mariadb/11.10.6/xwiki-platform-distribution-debian-tomcat9-mariadb-11.10.6.deb
Size: 1480
MD5sum: 69c654d555e788ad4672b337308146b1
SHA1: b568c2b5dfbe4204bc905f5efb5fd49fb3f17ed8
SHA256: 200afb0133c30f4ca6f62f8e6be20211cf50aad3c10fab6529d289d12b30463b

Checking DebianRepository/Format - Debian Wiki I see

A Packages index may contain multiple versions of one binary package, for the same architecture and/or multiple architectures (that is, all and the native architecture).
In the official Debian archive, this is used to keep around old versions of an Architecture: all package that is still needed by the other packages.

but I don’t know if with “multiple versions of one binary package” they mean with identical "Version: " values.

I tried adding the same XWiki repository straight to a Debian 12 installation and it works without issue.

So unsure how to see this:
a.) Fact it works straight on Debian is a fluke (not so intended by Debian folks) and would need to clarified there+with XWiki folks
or
b.) Katello/Foreman should be adapted to consider additional fields for uniqueness eg (Package:Architecture:Version:Filename)

Many thanks in advance for your guidance !

Expected outcome:
Goal is to have the XWiki repository integrated into our foreman+katello

Foreman and Proxy versions:
Fresh install of foreman 3.10.0 with katello 4.12.0

Foreman and Proxy plugin versions:
foreman-tasks 9.1.1
foreman_fog_proxmox 0.15.0
foreman_remote_execution 12.0.5

Distribution and version:
AlmaLinux 9.3

Other relevant data:
Repo setup doing:

wget -qO - "https://maven.xwiki.org/public.gpg" -O xwiki_debian_gpg_keyring.txt
gpg --no-default-keyring --keyring gnupg-ring:./xwiki_debian_gpg_keyring.txt --export --armor --output xwiki_debian_gpg.txt
hammer content-credentials create --organization 'ORG' --path './xwiki_debian_gpg.txt' --name 'DEB-GPG-KEY-XWiki' --content-type gpg_key
hammer product create --name 'XWiki' --organization 'ORG' --description 'XWiki' --sync-plan 'Monthly_Sync'
hammer content-credentials list --organization ORG
hammer repository create --organization 'ORG' --product 'XWiki' --name 'XWiki stable' --label 'XWiki_stable' --content-type 'deb' --download-policy 'on_demand' --gpg-key-id <<ID of XWiki GPG KEY>> --url 'http://maven.xwiki.org' --deb-releases stable/ --mirroring-policy mirror_content_only

Action:

Actions::Pulp3::Repository::Sync

Input:

{"repo_id"=>16,
 "smart_proxy_id"=>1,
 "options"=>{},
 "remote_user"=>"admin",
 "remote_cp_user"=>"admin",
 "current_request_id"=>"5b766956-8081-4ed3-b233-0fa4bb72f06d",
 "current_timezone"=>"Europe/Brussels",
 "current_organization_id"=>1,
 "current_location_id"=>2,
 "current_user_id"=>4}

Output:

{"pulp_tasks"=>
  [{"pulp_href"=>"/pulp/api/v3/tasks/018ecc26-2a6f-7f2f-94be-796000bdc4ac/",
    "pulp_created"=>"2024-04-11T07:53:57.103+00:00",
    "state"=>"failed",
    "name"=>"pulp_deb.app.tasks.synchronizing.synchronize",
    "logging_cid"=>"5b766956-8081-4ed3-b233-0fa4bb72f06d",
    "created_by"=>"/pulp/api/v3/users/1/",
    "started_at"=>"2024-04-11T07:53:57.164+00:00",
    "finished_at"=>"2024-04-11T07:54:30.840+00:00",
    "error"=>
     {"traceback"=>
       "  File \"/usr/lib/python3.11/site-packages/pulpcore/tasking/tasks.py\", line 61, in _execute_task\n" +
       "    result = func(*args, **kwargs)\n" +
       "             ^^^^^^^^^^^^^^^^^^^^^\n" +
       "  File \"/usr/lib/python3.11/site-packages/pulp_deb/app/tasks/synchronizing.py\", line 183, in synchronize\n" +
       "    DebDeclarativeVersion(first_stage, repository, mirror=mirror).create()\n" +
       "  File \"/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/declarative_version.py\", line 155, in create\n" +
       "    with self.repository.new_version() as new_version:\n" +
       "  File \"/usr/lib/python3.11/site-packages/pulpcore/app/models/repository.py\", line 1105, in __exit__\n" +
       "    repository.finalize_new_version(self)\n" +
       "  File \"/usr/lib/python3.11/site-packages/pulp_deb/app/models/repository.py\", line 99, in finalize_new_version\n" +
       "    handle_duplicate_packages(new_version)\n" +
       "  File \"/usr/lib/python3.11/site-packages/pulp_deb/app/models/repository.py\", line 165, in handle_duplicate_packages\n" +
       "    raise ValueError(message)\n",
      "description"=>
       "Cannot create repository version since there are newly added packages with the same name, version, and architecture, but a different checksum. If the log level is DEBUG, you can find a list of affected packages in the Pulp log."},
    "worker"=>"/pulp/api/v3/workers/018ebe12-bd74-748d-afa3-46d9cf4ba089/",
    "child_tasks"=>[],
    "progress_reports"=>
     [{"message"=>"Update PackageIndex units",
       "code"=>"update.packageindex",
       "state"=>"completed",
       "done"=>1},
      {"message"=>"Un-Associating Content",
       "code"=>"unassociating.content",
       "state"=>"completed",
       "done"=>0},
      {"message"=>"Associating Content",
       "code"=>"associating.content",
       "state"=>"completed",
       "done"=>9397},
      {"message"=>"Downloading Artifacts",
       "code"=>"sync.downloading.artifacts",
       "state"=>"completed",
       "done"=>5},
      {"message"=>"Update ReleaseFile units",
       "code"=>"update.release_file",
       "state"=>"completed",
       "done"=>1}],
    "created_resources"=>[],
    "reserved_resources_record"=>
     ["/pulp/api/v3/repositories/deb/apt/018ecc25-dd5e-7ca9-a31e-54c8270261aa/",
      "shared:/pulp/api/v3/remotes/deb/apt/018ecc25-dad6-7f68-9909-557d6c129a1b/",
      "shared:/pulp/api/v3/domains/018ebe04-4e98-70f7-bf57-1f965f392b4b/"]}],
 "create_version"=>true,
 "task_groups"=>[],
 "poll_attempts"=>{"total"=>19, "failed"=>1}}

Exception:

Katello::Errors::Pulp3Error: Cannot create repository version since there are newly added packages with the same name, version, and architecture, but a different checksum. If the log level is DEBUG, you can find a list of affected packages in the Pulp log.

Backtrace:

/usr/share/gems/gems/katello-4.12.0/app/lib/actions/pulp3/abstract_async_task.rb:108:in `block in check_for_errors'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/pulp3/abstract_async_task.rb:106:in `each'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/pulp3/abstract_async_task.rb:106:in `check_for_errors'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/pulp3/abstract_async_task.rb:162:in `poll_external_task'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action/polling.rb:100:in `poll_external_task_with_rescue'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action/polling.rb:22:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action/cancellable.rb:14:in `run'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/pulp3/abstract_async_task.rb:10:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:589:in `block (3 levels) in execute_run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:32:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/middleware/remote_action.rb:16:in `block in run'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/middleware/remote_action.rb:40:in `block in as_remote_user'
/usr/share/gems/gems/katello-4.12.0/app/models/katello/concerns/user_extensions.rb:21:in `cp_config'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/middleware/remote_action.rb:27:in `as_cp_user'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/middleware/remote_action.rb:39:in `as_remote_user'
/usr/share/gems/gems/katello-4.12.0/app/lib/actions/middleware/remote_action.rb:16:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/rails_executor_wrap.rb:14:in `block in run'
/usr/share/gems/gems/activesupport-6.1.7.7/lib/active_support/execution_wrapper.rb:91:in `wrap'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/rails_executor_wrap.rb:13:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action/progress.rb:31:in `with_progress_calculation'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action/progress.rb:17:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/load_setting_values.rb:20:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_request_id.rb:15:in `block in run'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_request_id.rb:52:in `restore_current_request_id'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_request_id.rb:15:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_timezone.rb:15:in `block in run'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_timezone.rb:44:in `restore_curent_timezone'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_timezone.rb:15:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_taxonomies.rb:15:in `block in run'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_taxonomies.rb:45:in `restore_current_taxonomies'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_taxonomies.rb:15:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:32:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:27:in `pass'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware.rb:19:in `pass'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_user.rb:15:in `block in run'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_user.rb:54:in `restore_curent_user'
/usr/share/gems/gems/foreman-tasks-9.1.1/app/lib/actions/middleware/keep_current_user.rb:15:in `run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/stack.rb:23:in `call'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/middleware/world.rb:31:in `execute'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:588:in `block (2 levels) in execute_run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:587:in `catch'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:587:in `block in execute_run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:490:in `block in with_error_handling'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:490:in `catch'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:490:in `with_error_handling'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:582:in `execute_run'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:303:in `execute'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:18:in `block (2 levels) in execute'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/execution_plan/steps/abstract.rb:167:in `with_meta_calculation'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:17:in `block in execute'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:32:in `open_action'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:16:in `execute'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/director.rb:94:in `execute'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/executors/sidekiq/worker_jobs.rb:11:in `block (2 levels) in perform'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/executors.rb:18:in `run_user_code'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/executors/sidekiq/worker_jobs.rb:9:in `block in perform'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/executors/sidekiq/worker_jobs.rb:25:in `with_telemetry'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/executors/sidekiq/worker_jobs.rb:8:in `perform'
/usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/executors/sidekiq/serialization.rb:27:in `perform'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:202:in `execute_job'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:170:in `block (2 levels) in process'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/middleware/chain.rb:172:in `invoke'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:169:in `block in process'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/job_retry.rb:113:in `local'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq.rb:44:in `block in <module:Sidekiq>'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:263:in `stats'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/job_logger.rb:13:in `call'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/job_retry.rb:80:in `global'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:124:in `block in dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/job_logger.rb:39:in `prepare'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:123:in `dispatch'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:168:in `process'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:78:in `process_one'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/processor.rb:68:in `run'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/component.rb:8:in `watchdog'
/usr/share/gems/gems/sidekiq-6.5.12/lib/sidekiq/component.rb:17:in `block in safe_thread'
/usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

Hi @be-mot ,

This looks to be more of a pulp-deb issue, it would be worth checking if your problem is mentioned here: Issues · pulp/pulp_deb · GitHub

@quba42 have you had any users encounter this before?

Let me start by saying that this is a pulp_deb issue, and it is very much by design.

The key sentence from the “Debian Repository Format” page you linked to is:

A repository must not include different packages (different content) with the same package name, version, and architecture.

pulp_deb here understands “different content” as different checksums.

So the current design works as follows:

If there are multiple packages in a repo with the same name, version, and architecture BUT NOT the same checksum, we raise the error you encountered. (If that is not the situation in the upstream repo, and you are still getting this error, then that would be a bug in pulp_deb.)

The problem is that there is just no way for any client consuming an APT repo to decide which of such duplicate packages should be installed when requested. I believe the actual APT tooling will just install the first of such duplicates that it encounters, so you won’t get any errors consuming such a repository on a host. This puts pulp_deb in an awkward place since it is both client and server.

The best we could do is creating a sync option to just keep one of the duplicates at random and discard the others, but then there is no way of knowing if a host attached to the upstream repo and one attached to the same repo synced to pulp_deb would install the same thing. We really want users to report duplicate packages to the upstream repo maintainers and have them fix their repos.

I don’t have time to do this right now, but I would still like to check what packages exactly in the https://maven.xwiki.org/stable repo are the “illegal duplicates”.

1 Like

Thanks for the detailed write-up @quba42 . Considering what you just said, is there a workaround/hack that you know of that might help fix the situation?

I went and analyzed a bit further and have the following key takeaways:

  1. The only immediately possible workaround would be to create an empty repository and manually upload the needed packages from the upstream repository to this repo one by one. Given the sheer number of packages in this upstream repository this seems hardly feasible (though I suspect only a small number of the packages from the upstream repo is actually needed, since there are so many duplicate package versions in it, but finding the right ones would also be difficult and still laborious overall)
  2. The issues in this repository are numerous. Some are merely bad practice, but there are plenty of downright falsehoods in the repo metadata.
  3. In spite of (2.) the falsehoods are such, that apt can process the repo and install from it without errors, and pulp_deb could in principle be made to correct for the issues as well, but this will take some significant work.

Some (non-fatal) issues I have identified with the repo:

  • The repo uses the deprecated flat repository format (pulp_deb can handle this).
  • The repo has many different historic versions of each package, which creates a lot of noise and potentially non-installable versions that are not really needed, because only the latest version is expected to be installed.
  • I had to enable several different Debian releases ranging from oldstable to unstable in order to provide sufficient dependencies to install from the repo in my test setup.
  • The Release file references itself with a (necessarily) incorrect checksum. (A file cannot contain it’s own checksum!)

The issue that pulp_deb does choke on:

The Packages index file contains multiple (duplicate) entries for each of the following packages:

pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat9-mysql', 'version': '12.6', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat9-mysql', 'version': '11.10.7', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat9-mysql', 'version': '11.10.6', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat8-mysql', 'version': '11.10.8', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat9-mysql', 'version': '11.10.8', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat8-mysql', 'version': '12.6', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat8-mysql', 'version': '11.10.6', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!
pulp_deb.app.models.repository:DEBUG: New repository version is trying to add different versions, of package "{'package': 'xwiki-tomcat8-mysql', 'version': '11.10.7', 'architecture': 'all'}", to each of the following distribution-component combinations "{'/ flat-repo-component'}"!

In each case, it looks like the duplicate entries reference the same .deb package file, but with different checksums. Clearly, the file at that location can only have one checksum, so all but one of each set of duplicate entries must be lying! This is actually a special case that I did not consider when I designed the duplicate package handling in pulp_deb. (The case we designed for is duplicate packages with the same name, version, and architecture but different checksums, referencing different .deb package files in different repo components.)

It looks like the correct entry for each set of duplicate entries is always the first of that set of duplicates in the Packages file. I suspect apt just looks for the first entry with that Name, Version, and Architecture and uses it. That way apt is not encountering the duplicate entries (with the incorrect checksum) and therefore does NOT choke on the repo.

Trying to replicate this behavior in pulp_deb would be difficult (but not impossible). The problem is that pulp_deb is designed to process every entry in the Packages file. Once pulp_deb sees there are duplicates, it no longer remembers which of these duplicates came top most in the upstream Package index.

Next steps @be-mot:

  1. If possible I still recommend opening an issue with the maintainers of the upstream repo asking them to clean up their repo metadata, in particular to remove the incorrect duplicate entries for the packages I have listed above.
  2. Since pulp_deb could be made to correct for this particular upstream repo metadata inconsistency, feel free to open an issue here: Issues · pulp/pulp_deb · GitHub Be sure to link to this thread. Since this is quite a special edge case, that would require quite a bit of work to make pulp_deb robust against it, I cannot make any promises as to how quickly we will work on a fix.
2 Likes

Note to self: My verified reproducer for this issue (using Pulp CLI) is:

NAME='xwiki-duplicates-test'
REMOTE_OPTIONS=(
  --url=https://maven.xwiki.org/
  --distribution=stable/
  --policy=on_demand
)

pulp deb remote create --name=${NAME} ${REMOTE_OPTIONS[@]}
pulp deb repository create --name=${NAME} --remote=${NAME}
pulp deb repository sync --name=${NAME}
1 Like