Katello Debian 11 repo sync issues (pulp3 seems stuck)

Problem:
Im trying to create a content repository for Debian 11 (also tried 10) and it seems to gets stuck on Actions::Pulp3::Repository::CreatePublication.

Expected outcome:
I have added RHEL and puppet repositories and those work fine, i would expect Debian repos to also be added and synced.
Foreman and Proxy versions:
3.10 w/Katello
Foreman and Proxy plugin versions:
N/A
Distribution and version:
Rhel 8.9
Other relevant data:
I first tried creating a Debian 10 repo and i experienced the same issue. I then cancelled those tasks, and ended up force cancelling a metadata task afterwards. I rebooted the entire server to make sure nothing was stuck.
Then i created the Debian 11 repo and started syncing that.
Both times it gets stuck at 46% and i see that its waiting for the Pulp3 CreatePublication step in dynflow console.
I am stuck troubleshooting this, there doesnt seem to be anything relevant in /var/log/ and i cant tell what specifically its stuck at. I read some of the pulp3 docs and i see that i can enable debug logging directly in pulp, however ive not done that yet, i want to see if someone here has an idea for further troubleshooting.

From GUI “running steps”:

Action:

Actions::Pulp3::Repository::CreatePublication

State:suspended
Input:

{"repository_id"=>33,
 "smart_proxy_id"=>1,
 "contents_changed"=>true,
 "skip_publication_creation"=>false,
 "remote_user"=>"admin",
 "remote_cp_user"=>"admin",
 "current_request_id"=>"3e95f52a-0f31-4ec7-9564-8f65bc17475e",
 "current_timezone"=>"Europe/Berlin",
 "current_organization_id"=>1,
 "current_location_id"=>2,
 "current_user_id"=>4}

Output:

{"response"=>
  {"task"=>"/pulp/api/v3/tasks/018fce49-f171-73ab-81a0-ef353fdfae82/"},
 "pulp_tasks"=>
  [{"pulp_href"=>"/pulp/api/v3/tasks/018fce49-f171-73ab-81a0-ef353fdfae82/",
    "pulp_created"=>"2024-05-31T10:55:03.538+00:00",
    "state"=>"running",
    "name"=>"pulp_deb.app.tasks.publishing.publish",
    "logging_cid"=>"3e95f52a-0f31-4ec7-9564-8f65bc17475e",
    "created_by"=>"/pulp/api/v3/users/1/",
    "started_at"=>"2024-05-31T10:55:03.598+00:00",
    "worker"=>"/pulp/api/v3/workers/018fce30-84e5-713b-8747-e3263d6d551f/",
    "child_tasks"=>[],
    "progress_reports"=>[],
    "created_resources"=>
     ["/pulp/api/v3/publications/deb/apt/018fce49-f1bb-7326-89f8-abbd3f902b67/"],
    "reserved_resources_record"=>
     ["shared:/pulp/api/v3/repositories/deb/apt/018fce36-e8be-7cb7-9ad7-1dc932ac9f6e/",
      "shared:/pulp/api/v3/domains/018e332f-f0d9-798f-8cf8-c0ec3cd38c7b/"]}],
 "task_groups"=>[],
 "poll_attempts"=>{"total"=>447, "failed"=>0}}

Note that the poll attempts total is ever increasing, and failed is 0…

From the dynflow console:

 12: Actions::Pulp3::Repository::CreatePublication (waiting for Pulp to finish the task publish (ID: ef353fdfae82)) [ 6960.25s / 48.49s ] Cancel

Queue: default

Started at: 2024-05-31 10:55:03 UTC

Ended at: 2024-05-31 12:51:03 UTC

Real time: 6960.25s

Execution time (excluding suspended state): 48.49s

Input:

---
repository_id: 33
smart_proxy_id: 1
contents_changed: true
skip_publication_creation: false
remote_user: admin
remote_cp_user: admin
current_request_id: 3e95f52a-0f31-4ec7-9564-8f65bc17475e
current_timezone: Europe/Berlin
current_organization_id: 1
current_location_id: 2
current_user_id: 4

Output:

---
response:
  task: "/pulp/api/v3/tasks/018fce49-f171-73ab-81a0-ef353fdfae82/"
pulp_tasks:
- !ruby/object:Katello::Pulp3::Task
  smart_proxy: !ruby/object:SmartProxy
    concise_attributes:
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: id
      value_before_type_cast: 1
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: name
      value_before_type_cast: redacted.net
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: url
      value_before_type_cast: https://redacted.net:9090
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: created_at
      value_before_type_cast: 2024-03-12 15:05:56.488546000 Z
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: updated_at
      value_before_type_cast: 2024-03-20 13:58:11.631617000 Z
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: pubkey
      value_before_type_cast: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABg*snip*=
        foreman-proxy@redacted.net
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: expired_logs
      value_before_type_cast: '0'
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: puppet_path
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: download_policy
      value_before_type_cast: on_demand
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: http_proxy_id
    - !ruby/object:ActiveModel::Attribute::FromDatabase
      name: content_counts
    new_record: false
    active_record_yaml_version: 2
  pulp_data: !ruby/hash:ActiveSupport::HashWithIndifferentAccess
    pulp_href: "/pulp/api/v3/tasks/018fce49-f171-73ab-81a0-ef353fdfae82/"
    pulp_created: '2024-05-31T10:55:03.538+00:00'
    state: running
    name: pulp_deb.app.tasks.publishing.publish
    logging_cid: 3e95f52a-0f31-4ec7-9564-8f65bc17475e
    created_by: "/pulp/api/v3/users/1/"
    started_at: '2024-05-31T10:55:03.598+00:00'
    worker: "/pulp/api/v3/workers/018fce30-84e5-713b-8747-e3263d6d551f/"
    child_tasks: []
    progress_reports: []
    created_resources:
    - "/pulp/api/v3/publications/deb/apt/018fce49-f1bb-7326-89f8-abbd3f902b67/"
    reserved_resources_record:
    - shared:/pulp/api/v3/repositories/deb/apt/018fce36-e8be-7cb7-9ad7-1dc932ac9f6e/
    - shared:/pulp/api/v3/domains/018e332f-f0d9-798f-8cf8-c0ec3cd38c7b/
  href: "/pulp/api/v3/tasks/018fce49-f171-73ab-81a0-ef353fdfae82/"
task_groups: []
poll_attempts:
  total: 452
  failed: 0

Chunked output:

--- []

Hi @nem ,

The Pulp task seems to be taking a long time? I can be queried via:

sudo pulp show --href /pulp/api/v3/tasks/018fce49-f171-73ab-81a0-ef353fdfae82/

How long was it taking? Sometimes creating the publication can be lengthy depending on the repository you’re syncing.

If the task say’s it’s done however but Foreman isn’t picking up on it, that’s another issue.

cc @quba42

Hello,

I didnt get a chance to continue working on the server yet, but since you replied to my issue i logged in to check on it now.

It did complete the sync, it did complete after about 5 hour so i was just impatient i guess. It took far longer than the RHEL repos i synced previously.
Ive noted that command to check on tasks, it seems like almost the same output as what we see in “Running steps” but it includes finished time as well i see.

In your experience, how often should i sync the OS repos like Debian releases and RHEL? I initially thought i would run them every hour, but considering the time this takes to sync im not sure that would be very wise… :slight_smile:

What distributions, components, architectures are you syncing for the repo?

Debain main components tend to contain more than 50k packages, and writing new metadata for those (pulp publish) can take a long time. Re-syncs will be fast if the upstream repo did not change, but as soon as you are getting any new packages, the metadata needs to be re-generated for all packages and the publish will take long again. Of course a re-sync with new packages might still be somewhat faster than the initial sync since less packages need to be downloaded at the start of the sync process.

I would think that syncing at most once a day (preferably during some low time during the night) would be enough. It can also make sense not to queue all of your repo syncs in parallel at the same time, but distributing them around a bit.

One more thing to consider: Sometimes adding more Pulp workers can be counter productive, because it causes many jobs to run in parallel which can create bottle necks that slow each individual job down to a crawl. So experimenting with the number of Pulp workers can also makes sense.

That is more or less all the suggestions/hints I have.

Im syncing Bullseye main amd64 for the repo and it ended up at 58931 packages, so quite a lot.
Plan is to extend it to have contrib and non-free as well.

Ive set the download policy to “On Demand” for now. Not sure we need to download everything but i will see after more testing.

I appreciate your advice regarding sync. Once a day makes sense to me as well, i need to experiment a bit to find a good spread for them.

My first thought on friday was to increase the workers, but as i tried to dig into pulp to see what was going on it didnt really look like it was using all the workers so i didnt touch it. Im running the default 5(?) at the moment. I will try to see if it makes sense to increase this after ive added all the repos and setup the syncing.

A warning: The upstream Debian repos are known to remove old package versions, which means it is possible with “On Demand” that your hosts try to access a package which should be downloaded on demand, but is no longer available in the remote repo. Syncing immediate should not take much longer than “on demand” (at least after the initial sync), so it is probably best to sync immediate unless you don’t have enough disk storage space.

~60k packages and 5 hours sounds consistent with our experience. We do have some long term plans to improve pulp_deb sync performance, but it will take at least several months to make progress with that.

By chance I happened upon this issue: Add missing database indices to RemoteArtifact by hstct · Pull Request #4836 · pulp/pulpcore · GitHub

Looks like it has to do with slow publications of on demand repositories. I don’t think the fix made it into Pulpcore 3.39, which I believe you have. Foreman 3.11 / Katello 4.13 will have the fix. The fix just adds missing indexes, which you might be able to backport into your system manually.

That fix was released with pulpcore 3.43.0. Since it contains a DB migration it is not trivial to backport without confusing Django.

1 Like