Slow content proxy sync

Problem:
Since the latest of our main server and content proxy

    Upgrade  python39-pulpcore-3.22.14-1.el8.noarch                           @ORG_katello_pulpcore_3_22_el8_x86_64
    Upgraded python39-pulpcore-3.22.12-1.el8.noarch                           @@System
    Upgrade  python39-pulp-deb-2.20.4-1.el8.noarch                            @ORG_katello_pulpcore_3_22_el8_x86_64
    Upgraded python39-pulp-deb-2.20.2-1.el8.noarch                            @@System

I occasionally have content proxy syncs after a new cv publish taking 2 hours or longer instead of 10 minutes or less as usually.

I can see that a postmaster process is running at 100% and pg_stat_activity shows a specific type of query running a couple of minutes, like the following one. The following is on the postgresql database on the content proxy.

pulpcore=# select state, query_start, current_timestamp, query from pg_stat_activity where state <> 'idle';
 state  |          query_start          |       current_timestamp       |                                                                                                                                                                                                                              
                                                                                                                                                           query                                                                                                                                       
                                                                                                                                                                                                                                                   
--------+-------------------------------+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 active | 2023-09-19 06:42:21.677564+02 | 2023-09-19 06:42:21.677564+02 | select state, query_start, current_timestamp, query from pg_stat_activity where state <> 'idle';
 active | 2023-09-19 06:36:26.901377+02 | 2023-09-19 06:42:21.677564+02 | SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('496440d4-1388-4ba5-a1fc-16cbdc29da89'::uuid, 'f28544f0-b8f4-42b4-806b-b09b2f0eaa7a'::uuid, 'f5c40092-0c30-47f1-a213-acd7b25df32a'::u
uid, 'a327d603-7c42-4243-96c8-6d0a0f422c78'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 WHERE V0."pulp_id" IN (SELECT U0."content_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEF
T OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 235 AND NOT (U3."number" <= 235 AND U3."number" IS NOT NULL))))))
(2 rows)

pulpcore=# EXPLAIN SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('496440d4-1388-4ba5-a1fc-16cbdc29da89'::uuid, 'f28544f0-b8f4-42b4-806b-b09b2f0eaa7a'::uuid, 'f5c40092-0c30-47f1-a213-acd7b25df32a'::uuid, 'a327d603-7c42-4243-96c8-6d0a0f422c78'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 WHERE V0."pulp_id" IN (SELECT U0."content_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 235 AND NOT (U3."number" <= 235 AND U3."number" IS NOT NULL))))));
                                                                                           QUERY PLAN                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using core_content_pkey on core_content  (cost=48176.71..48202.06 rows=2 width=16)
   Index Cond: (pulp_id = ANY ('{496440d4-1388-4ba5-a1fc-16cbdc29da89,f28544f0-b8f4-42b4-806b-b09b2f0eaa7a,f5c40092-0c30-47f1-a213-acd7b25df32a,a327d603-7c42-4243-96c8-6d0a0f422c78}'::uuid[]))
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  Nested Loop Semi Join  (cost=4009.00..48176.29 rows=1 width=16)
           Join Filter: (v0.pulp_id = u0.content_id)
           ->  Seq Scan on core_content v0  (cost=0.00..7129.81 rows=278881 width=16)
           ->  Materialize  (cost=4009.00..36863.27 rows=1 width=16)
                 ->  Nested Loop  (cost=4009.00..36863.26 rows=1 width=16)
                       Join Filter: (u0.version_added_id = u2.pulp_id)
                       ->  Hash Left Join  (cost=4009.00..36793.66 rows=1 width=32)
                             Hash Cond: (u0.version_removed_id = u3.pulp_id)
                             Filter: ((u3.number > 235) OR (u3.number IS NULL))
                             ->  Bitmap Heap Scan on core_repositorycontent u0  (cost=3943.86..36617.66 rows=42233 width=48)
                                   Recheck Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                                   ->  Bitmap Index Scan on core_repositorycontent_repository_id_1946e61b  (cost=0.00..3933.30 rows=42233 width=0)
                                         Index Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                             ->  Hash  (cost=42.84..42.84 rows=1784 width=20)
                                   ->  Seq Scan on core_repositoryversion u3  (cost=0.00..42.84 rows=1784 width=20)
                       ->  Seq Scan on core_repositoryversion u2  (cost=0.00..47.30 rows=1784 width=16)
                             Filter: (number <= 235)
(21 rows)

As you can see the query is already running for 6 minutes.

The task running is this:

Actions::Pulp3::CapsuleContent::Sync

State:suspended
Input:

{"repository_id"=>2181,
 "smart_proxy_id"=>5,
 "options"=>{"skip_metadata_check"=>false},
 "remote_user"=>"admin",
 "remote_cp_user"=>"admin",
 "current_request_id"=>"4b49a750-94cf-4c62-8f87-e610ce5472d9",
 "current_timezone"=>"UTC",
 "current_organization_id"=>nil,
 "current_location_id"=>nil,
 "current_user_id"=>4}

Output:

{"smart_proxy_history_id"=>26672,
 "pulp_tasks"=>
  [{"pulp_href"=>"/pulp/api/v3/tasks/b6356c1e-4054-4c41-ba79-889c0e204173/",
    "pulp_created"=>"2023-09-19T02:35:51.384+00:00",
    "state"=>"running",
    "name"=>"pulp_rpm.app.tasks.synchronizing.synchronize",
    "logging_cid"=>"4b49a750-94cf-4c62-8f87-e610ce5472d9",
    "started_at"=>"2023-09-19T02:36:00.450+00:00",
    "worker"=>"/pulp/api/v3/workers/71e5e226-b611-4070-a047-4ed932c493eb/",
    "child_tasks"=>[],
    "progress_reports"=>
     [{"message"=>"Parsed Comps",
       "code"=>"sync.parsing.comps",
       "state"=>"completed",
       "total"=>22,
       "done"=>22},
      {"message"=>"Downloading Metadata Files",
       "code"=>"sync.downloading.metadata",
       "state"=>"completed",
       "done"=>6},
      {"message"=>"Downloading Artifacts",
       "code"=>"sync.downloading.artifacts",
       "state"=>"running",
       "done"=>0},
      {"message"=>"Associating Content",
       "code"=>"associating.content",
       "state"=>"running",
       "done"=>51},
      {"message"=>"Skipping Packages",
       "code"=>"sync.skipped.packages",
       "state"=>"completed",
       "total"=>0,
       "done"=>0},
      {"message"=>"Parsed Packages",
       "code"=>"sync.parsing.packages",
       "state"=>"completed",
       "total"=>37217,
       "done"=>37217},
      {"message"=>"Parsed Advisories",
       "code"=>"sync.parsing.advisories",
       "state"=>"completed",
       "total"=>6356,
       "done"=>6356}],
    "created_resources"=>[""],
    "reserved_resources_record"=>
     ["/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/",
      "shared:/pulp/api/v3/remotes/rpm/rpm/6732d168-717d-47d1-9406-5fbb1fb52efd/"]}],
 "task_groups"=>[],
 "poll_attempts"=>{"total"=>479, "failed"=>0}}

Expected outcome:
Faster content proxy sync.

Foreman and Proxy versions:
Main server and content proxy are running

katello-4.8.4-1.el8.noarch
foreman-3.6.2-1.el8.noarch

with the latest updates.

These are the installed rpms on the content proxy:

# rpm -qa *foreman* *katello* *pulp* | sort -d
foreman-debug-3.6.2-1.el8.noarch
foreman-installer-3.6.2-1.el8.noarch
foreman-installer-katello-3.6.2-1.el8.noarch
foreman-proxy-3.6.2-1.el8.noarch
foreman-proxy-content-4.8.4-1.el8.noarch
katello-certs-tools-2.9.0-1.el8.noarch
katello-client-bootstrap-1.7.9-1.el8.noarch
katello-common-4.8.4-1.el8.noarch
katello-debug-4.8.4-1.el8.noarch
katello-host-tools-3.5.7-5.el8.noarch
katello-host-tools-tracer-3.5.7-5.el8.noarch
pulpcore-selinux-1.3.2-1.el8.x86_64
python39-pulp-ansible-0.16.0-1.el8.noarch
python39-pulp-certguard-1.5.6-1.el8.noarch
python39-pulp-container-2.14.7-1.el8.noarch
python39-pulpcore-3.22.14-1.el8.noarch
python39-pulp-deb-2.20.4-1.el8.noarch
python39-pulp-file-1.12.0-1.el8.noarch
python39-pulp-python-3.8.0-1.el8.noarch
python39-pulp-rpm-3.19.9-1.el8.noarch
rubygem-foreman_maintain-1.3.5-1.el8.noarch
rubygem-smart_proxy_pulp-3.2.0-3.fm3_3.el8.noarch

Distribution and version:
AlmaLinux 8.8

Other relevant data:

Hi @gvde ,

Can we get an idea of how much content you are syncing to your smart proxy?

Does it look like it’s specifically Debian content that is taking longer to sync, or is it all repositories? Asking since you pointed out pulpcore + pulp_deb having z-updates.

It’s interesting that the query you noted seems to be looking up content, because there was a PR to improve content lookups in Pulpcore: https://github.com/pulp/pulpcore/issues/3969

That was added in Pulpcore 3.22.13.

The rpm repository in question was epel9. The number of packages etc. is in the above output. No other repositories where syncing at that time (i.e. they have finished before).

I don’t have any deb content. Only rpm. I have removed all deb content a while ago because I wasn’t using it.

Yesterday, the sync of the same repository 43e7b651-8399-4bda-b54e-900dea17af7e even took 4 hours.

All other syncs before yesterday where before the update and the longest one took 10 minutes, not a couple of hours.

Well, the explain shows that it’s going through a lot of rows which adds up to a high cost.

I have updated from 3.22.12 to 3.22.14, so that could be the reason…

I’ll refrain from pinging ATIX then :slight_smile:

Anyway, that’s very interesting. @dralley do you think there could’ve been a chance of regression on https://github.com/pulp/pulpcore/issues/3969 ? So far it’s the only issue that seems related to performance.

I have just ran the query above with explain analyze. It took 65 minutes. I don’t know if that’s because of the analyze or because it has run outside the sync. Output is as follows:

pulpcore=# EXPLAIN analyze SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('496440d4-1388-4ba5-a1fc-16cbdc29da89'::uuid, 'f28544f0-b8f4-42b4-806b-b09b2f0eaa7a'::uuid, 'f5c40092-0c30-47f1-a213-acd7b25df32a'::uuid, 'a327d603-7c42-4243-96c8-6d0a0f422c78'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 WHERE V0."pulp_id" IN (SELECT U0."content_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 235 AND NOT (U3."number" <= 235 AND U3."number" IS NOT NULL))))));
                                                                                           QUERY PLAN                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using core_content_pkey on core_content  (cost=48176.71..48202.06 rows=2 width=16) (actual time=3948806.469..3948806.478 rows=0 loops=1)
   Index Cond: (pulp_id = ANY ('{496440d4-1388-4ba5-a1fc-16cbdc29da89,f28544f0-b8f4-42b4-806b-b09b2f0eaa7a,f5c40092-0c30-47f1-a213-acd7b25df32a,a327d603-7c42-4243-96c8-6d0a0f422c78}'::uuid[]))
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 4
   Heap Fetches: 4
   SubPlan 1
     ->  Nested Loop Semi Join  (cost=4009.00..48176.29 rows=1 width=16) (actual time=7165.704..3948722.622 rows=43595 loops=1)
           Join Filter: (v0.pulp_id = u0.content_id)
           Rows Removed by Join Filter: 11214159825
           ->  Seq Scan on core_content v0  (cost=0.00..7129.81 rows=278881 width=16) (actual time=0.014..301.233 rows=279033 loops=1)
           ->  Materialize  (cost=4009.00..36863.27 rows=1 width=16) (actual time=0.000..6.863 rows=40190 loops=279033)
                 ->  Nested Loop  (cost=4009.00..36863.26 rows=1 width=16) (actual time=6.295..20096.469 rows=43595 loops=1)
                       Join Filter: (u0.version_added_id = u2.pulp_id)
                       Rows Removed by Join Filter: 48636032
                       ->  Hash Left Join  (cost=4009.00..36793.66 rows=1 width=32) (actual time=5.845..54.093 rows=43595 loops=1)
                             Hash Cond: (u0.version_removed_id = u3.pulp_id)
                             Filter: ((u3.number > 235) OR (u3.number IS NULL))
                             Rows Removed by Filter: 22
                             ->  Bitmap Heap Scan on core_repositorycontent u0  (cost=3943.86..36617.66 rows=42233 width=48) (actual time=4.750..32.623 rows=43617 loops=1)
                                   Recheck Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                                   Heap Blocks: exact=3240
                                   ->  Bitmap Index Scan on core_repositorycontent_repository_id_1946e61b  (cost=0.00..3933.30 rows=42233 width=0) (actual time=4.430..4.431 rows=43639 loops=1)
                                         Index Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                             ->  Hash  (cost=42.84..42.84 rows=1784 width=20) (actual time=1.084..1.085 rows=1887 loops=1)
                                   Buckets: 2048  Batches: 1  Memory Usage: 112kB
                                   ->  Seq Scan on core_repositoryversion u3  (cost=0.00..42.84 rows=1784 width=20) (actual time=0.037..0.602 rows=1887 loops=1)
                       ->  Seq Scan on core_repositoryversion u2  (cost=0.00..47.30 rows=1784 width=16) (actual time=0.002..0.257 rows=1117 loops=43595)
                             Filter: (number <= 235)
 Planning Time: 0.635 ms
 Execution Time: 3948807.779 ms
(30 rows)

I hope this helps…

What happens if you set the “cursor_tuple_fraction” to some higher value, like 0.9 or 1.0

Set to 1.0:

pulpcore=# show cursor_tuple_fraction;
 cursor_tuple_fraction 
-----------------------
 1
(1 row)

no difference

pulpcore=# EXPLAIN analyze SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('496440d4-1388-4ba5-a1fc-16cbdc29da89'::uuid, 'f28544f0-b8f4-42b4-806b-b09b2f0eaa7a'::uuid, 'f5c40092-0c30-47f1-a213-acd7b25df32a'::uuid, 'a327d603-7c42-4243-96c8-6d0a0f422c78'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 WHERE V0."pulp_id" IN (SELECT U0."content_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 235 AND NOT (U3."number" <= 235 AND U3."number" IS NOT NULL))))));
                                                                                           QUERY PLAN                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using core_content_pkey on core_content  (cost=48176.71..48202.06 rows=2 width=16) (actual time=4084078.606..4084078.615 rows=0 loops=1)
   Index Cond: (pulp_id = ANY ('{496440d4-1388-4ba5-a1fc-16cbdc29da89,f28544f0-b8f4-42b4-806b-b09b2f0eaa7a,f5c40092-0c30-47f1-a213-acd7b25df32a,a327d603-7c42-4243-96c8-6d0a0f422c78}'::uuid[]))
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 4
   Heap Fetches: 4
   SubPlan 1
     ->  Nested Loop Semi Join  (cost=4009.00..48176.29 rows=1 width=16) (actual time=7261.857..4083991.655 rows=43595 loops=1)
           Join Filter: (v0.pulp_id = u0.content_id)
           Rows Removed by Join Filter: 11214159825
           ->  Seq Scan on core_content v0  (cost=0.00..7129.81 rows=278881 width=16) (actual time=0.008..218.406 rows=279033 loops=1)
           ->  Materialize  (cost=4009.00..36863.27 rows=1 width=16) (actual time=0.000..7.100 rows=40190 loops=279033)
                 ->  Nested Loop  (cost=4009.00..36863.26 rows=1 width=16) (actual time=6.145..20216.874 rows=43595 loops=1)
                       Join Filter: (u0.version_added_id = u2.pulp_id)
                       Rows Removed by Join Filter: 48636032
                       ->  Hash Left Join  (cost=4009.00..36793.66 rows=1 width=32) (actual time=5.686..52.092 rows=43595 loops=1)
                             Hash Cond: (u0.version_removed_id = u3.pulp_id)
                             Filter: ((u3.number > 235) OR (u3.number IS NULL))
                             Rows Removed by Filter: 22
                             ->  Bitmap Heap Scan on core_repositorycontent u0  (cost=3943.86..36617.66 rows=42233 width=48) (actual time=4.655..30.371 rows=43617 loops=1)
                                   Recheck Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                                   Heap Blocks: exact=3240
                                   ->  Bitmap Index Scan on core_repositorycontent_repository_id_1946e61b  (cost=0.00..3933.30 rows=42233 width=0) (actual time=4.333..4.334 rows=43639 loops=1)
                                         Index Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                             ->  Hash  (cost=42.84..42.84 rows=1784 width=20) (actual time=1.022..1.024 rows=1887 loops=1)
                                   Buckets: 2048  Batches: 1  Memory Usage: 112kB
                                   ->  Seq Scan on core_repositoryversion u3  (cost=0.00..42.84 rows=1784 width=20) (actual time=0.010..0.554 rows=1887 loops=1)
                       ->  Seq Scan on core_repositoryversion u2  (cost=0.00..47.30 rows=1784 width=16) (actual time=0.002..0.260 rows=1117 loops=43595)
                             Filter: (number <= 235)
 Planning Time: 0.747 ms
 Execution Time: 4084079.915 ms
(30 rows)

Can I get more information about CLI commands/API calls being used in this scenario? I am not familiar with Katello, so having a bash script with (plausible) API calls would be awesome. I would like to see execution plans for the old and new versions of pulpcore. The query itself does not look very suspicious.

I have most likely discovered a line of code that is causing the troubles: https://github.com/pulp/pulpcore/blob/bc9ca511dc13fbae7c49b19f1ebe8b527c67484a/pulpcore/app/models/repository.py#L853. So, I can experiment further with it. In the worst case, we may be required to revert the change.

@lubosmj I’m working with @qcjames53 to reproduce the issue – if we can we’ll send you some information and an address to the environment.

Thanks for the analysis @gvde !

1 Like

A basic test syncing RHEL 8 AppStream + BaseOS to a smart proxy doesn’t seems to have reveled anything. The smart proxy synced the repos in 17 minutes with pulpcore 3.22.12 and 3.22.14.

We can retest with EPEL 9 included just in case it’s specific to that repository somehow.

1 Like

Today, the sync from the main to the content proxy is slow again. I have checked and every sync has the problem. I publish and promote the content views each workday. Each time it syncs a new version of EPEL9. Each time it takes very long. Each time it’s only EPEL 9. All other repos in the same sync only need a few seconds.

On Monday last week and today it takes more then 4 hours. Tuesday through Friday morning it was approx. 2 hours each. So it seems to be related to the amount of changes in the repository.

It must be related to whatever is happening during “Associating Content” as that’s the part where it’s working on right now for a long time:

{"smart_proxy_history_id"=>27041,
 "pulp_tasks"=>
  [{"pulp_href"=>"/pulp/api/v3/tasks/cbe66dc4-7e37-4a22-ba7f-53d6eaae62cc/",
    "pulp_created"=>"2023-09-25T02:35:04.787+00:00",
    "state"=>"running",
    "name"=>"pulp_rpm.app.tasks.synchronizing.synchronize",
    "logging_cid"=>"cfa4d081-1054-4eec-8da3-098a543191df",
    "started_at"=>"2023-09-25T02:35:04.895+00:00",
    "worker"=>"/pulp/api/v3/workers/cb65b093-abb8-40a7-b0ff-1fd7217f92ea/",
    "child_tasks"=>[],
    "progress_reports"=>
     [{"message"=>"Downloading Metadata Files",
       "code"=>"sync.downloading.metadata",
       "state"=>"completed",
       "done"=>6},
      {"message"=>"Downloading Artifacts",
       "code"=>"sync.downloading.artifacts",
       "state"=>"completed",
       "done"=>0},
      {"message"=>"Associating Content",
       "code"=>"associating.content",
       "state"=>"running",
       "done"=>182},
      {"message"=>"Skipping Packages",
       "code"=>"sync.skipped.packages",
       "state"=>"completed",
       "total"=>0,
       "done"=>0},
      {"message"=>"Parsed Packages",
       "code"=>"sync.parsing.packages",
       "state"=>"completed",
       "total"=>37449,
       "done"=>37449},
      {"message"=>"Parsed Advisories",
       "code"=>"sync.parsing.advisories",
       "state"=>"completed",
       "total"=>6417,
       "done"=>6417},
      {"message"=>"Parsed Comps",
       "code"=>"sync.parsing.comps",
       "state"=>"completed",
       "total"=>22,
       "done"=>22}],
    "created_resources"=>[""],
    "reserved_resources_record"=>
     ["/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/",
      "shared:/pulp/api/v3/remotes/rpm/rpm/6732d168-717d-47d1-9406-5fbb1fb52efd/"]}],
 "task_groups"=>[],
 "poll_attempts"=>{"total"=>854, "failed"=>0}}

Some stats: day/duration/associating content/parsed packages/un-associating content

Mon/14177s/259/37177/14
Tue/8080s/59/37217/8
Wed/5258s/34/37234/11
Thu/6582s/28/37249/6
Fri/6954s/79/37311/7
currently running: Mon/13608s+/182+/37449/pending

One more sample of currently running query:

 active | 2023-09-25 08:17:35.036825+02 | 2023-09-25 08:23:12.863966+02 | SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('a6be1308-1695-4f5f-8d39-e05934b2793b'::uuid, 'fc243bc6-7b02-45a5-9709-8abbfb8c8bc8'::uuid, '469827ab-902e-42b7-a0e8-80232ed12415'::uuid, 'e3873203-8bea-477f-9309-a2fe62ad3df7':
:uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 WHERE V0."pulp_id" IN (SELECT U0."content_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHE
RE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 239 AND NOT (U3."number" <= 239 AND U3."number" IS NOT NULL))))))

pulpcore=# explain SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('a6be1308-1695-4f5f-8d39-e05934b2793b'::uuid, 'fc243bc6-7b02-45a5-9709-8abbfb8c8bc8'::uuid, '469827ab-902e-42b7-a0e8-80232ed12415'::uuid, 'e3873203-8bea-477f-9309-a2fe62ad3df7'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 WHERE V0."pulp_id" IN (SELECT U0."content_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 239 AND NOT (U3."number" <= 239 AND U3."number" IS NOT NULL))))));
                                                                                           QUERY PLAN                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using core_content_pkey on core_content  (cost=49053.38..49082.73 rows=2 width=16)
   Index Cond: (pulp_id = ANY ('{a6be1308-1695-4f5f-8d39-e05934b2793b,fc243bc6-7b02-45a5-9709-8abbfb8c8bc8,469827ab-902e-42b7-a0e8-80232ed12415,e3873203-8bea-477f-9309-a2fe62ad3df7}'::uuid[]))
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  Nested Loop Semi Join  (cost=4300.33..49052.96 rows=1 width=16)
           Join Filter: (v0.pulp_id = u0.content_id)
           ->  Seq Scan on core_content v0  (cost=0.00..7184.58 rows=280358 width=16)
           ->  Materialize  (cost=4300.33..37663.01 rows=1 width=16)
                 ->  Nested Loop  (cost=4300.33..37663.00 rows=1 width=16)
                       Join Filter: (u0.version_added_id = u2.pulp_id)
                       ->  Hash Left Join  (cost=4300.33..37592.75 rows=1 width=32)
                             Hash Cond: (u0.version_removed_id = u3.pulp_id)
                             Filter: ((u3.number > 239) OR (u3.number IS NULL))
                             ->  Bitmap Heap Scan on core_repositorycontent u0  (cost=4234.60..37403.04 rows=47232 width=48)
                                   Recheck Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                                   ->  Bitmap Index Scan on core_repositorycontent_repository_id_1946e61b  (cost=0.00..4222.79 rows=47232 width=0)
                                         Index Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                             ->  Hash  (cost=43.10..43.10 rows=1810 width=20)
                                   ->  Seq Scan on core_repositoryversion u3  (cost=0.00..43.10 rows=1810 width=20)
                       ->  Seq Scan on core_repositoryversion u2  (cost=0.00..47.62 rows=1810 width=16)
                             Filter: (number <= 239)
(21 rows)

Just wondering: slow queries are often due to missing indices. Could it be a database migration got wrong and didn’t create a index or maybe it was forgotten? How I can check if I have all indices I should have in the content proxy database?

@lubosmj and @dralley ,

I noticed that @gvde 's EPEL 9 has over 37K packages. The repo upstream currently has about 17K. Maybe it makes no difference to Pulp, but I’m guessing that the repo is running as additive and has collected quite a few extra RPMs over time.

Looks like indexes can be checked with something like

SELECT
    indexname,
    indexdef
FROM
    pg_indexes
WHERE
    tablename = 'table_name';

If you were asking about this in the Katello DB I’d just ask for your list of indexes and check against our code and a freshly installed system.

@lubosmj is on PTO for 2 weeks and I’m currently working a customer case, but the reason for the slowness is pretty clear at least - lots of nested loops and sequential scans in that query.

I really don’t understand why such a query plan would be generated though, and certainly not as a result of the change mentioned earlier.

@gvde what happens if you reverse this change? Does it get faster again? And what does that query plan look like? Enhance get_content() by omitting the version_membership lookup by lubosmj · Pull Request #4275 · pulp/pulpcore · GitHub

(please try this before attempting a mirror sync or anything like that, which might fix the issue but make debugging problematic)

1 Like

How many “versions” of the EPEL repo do you have? It could also potentially scale primarily with that factor rather than the size of the repo itself.

1 Like

O.K. I have reversed the change from pull request 4275, restarted everything on the content proxy, then published and promoted a new version of that CV which contains the repo 43e7b651-8399-4bda-b54e-900dea17af7e. EPEL 9 included a few new packages so I suppose it is a comparable situation.

The sync took after the promote finished took 6 minutes:

smart_proxy_history_id: 27081
pulp_tasks:
- pulp_href: "/pulp/api/v3/tasks/0fcc1838-9fa1-476c-a8d1-05238125df88/"
  pulp_created: '2023-09-25T15:03:04.556+00:00'
  state: completed
  name: pulp_rpm.app.tasks.synchronizing.synchronize
  logging_cid: b76c167a-0e17-4ce4-9ee2-6490599f0630
  started_at: '2023-09-25T15:03:04.637+00:00'
  finished_at: '2023-09-25T15:09:04.001+00:00'
  worker: "/pulp/api/v3/workers/975cceef-38ee-45d7-a358-a9f4ccdee680/"
  child_tasks: []
  progress_reports:
  - message: Un-Associating Content
    code: unassociating.content
    state: completed
    done: 14
  - message: Parsed Comps
    code: sync.parsing.comps
    state: completed
    total: 22
    done: 22
  - message: Parsed Advisories
    code: sync.parsing.advisories
    state: completed
    total: 6421
    done: 6421
  - message: Downloading Metadata Files
    code: sync.downloading.metadata
    state: completed
    done: 6
  - message: Downloading Artifacts
    code: sync.downloading.artifacts
    state: completed
    done: 0
  - message: Associating Content
    code: associating.content
    state: completed
    done: 30
  - message: Skipping Packages
    code: sync.skipped.packages
    state: completed
    total: 0
    done: 0
  - message: Parsed Packages
    code: sync.parsing.packages
    state: completed
    total: 37461
    done: 37461
  created_resources:
  - "/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/versions/240/"
  - "/pulp/api/v3/publications/rpm/rpm/bb39ffd8-eb4a-4b35-bc05-82b21d133240/"
  reserved_resources_record:
  - "/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/"
  - shared:/pulp/api/v3/remotes/rpm/rpm/6732d168-717d-47d1-9406-5fbb1fb52efd/
task_groups: []
poll_attempts:
  total: 43
  failed: 0

It was difficult to capture any selects with pg_stat_activity during that time because they finished so quickly. Here are two I have captured:

SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('17a51ead-67c3-4bb1-9e9d-fa54cddcbb10'::uuid, 'd9330405-c55c-419a-bfe5-8c69e12737d6'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 INNER JOIN "core_repositorycontent" V1 ON (V0."pulp_id" = V1."content_id") WHERE V1."pulp_id" IN (SELECT U0."pulp_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 240 AND NOT (U3."number" <= 240 AND U3."number" IS NOT NULL))))))

SELECT "core_content"."pulp_id" FROM "core_content" WHERE ("core_content"."pulp_id" IN ('04f64a3d-4053-4a04-a188-cba6457cd47e'::uuid, 'e28159a1-af0e-4793-acee-4eaa81e529b2'::uuid, '5ecb1955-fc1e-469e-a818-468682c8bd7e'::uuid, 'c74d591a-1717-4f32-ae89-2c6d4ea7cf05'::uuid, '9eae4837-a1e1-45db-9e31-d42f5ff8bc49'::uuid, '36df29b8-78b5-4065-b886-6f6fc405ee93'::uuid, 'f4be72a0-73c3-46af-97c9-5e20ea9d7242'::uuid, 'd74f6195-9569-44cb-b5da-4958dee4c289'::uuid) AND NOT ("core_content"."pulp_id" IN (SELECT V0."pulp_id" FROM "core_content" V0 INNER JOIN "core_repositorycontent" V1 ON (V0."pulp_id" = V1."content_id") WHERE V1."pulp_id" IN (SELECT U0."pulp_id" FROM "core_repositorycontent" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."version_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid AND U2."number" <= 240 AND NOT (U3."number" <= 240 AND U3."number" IS NOT NULL))))))

explain of the first one is:

                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on core_content  (cost=37676.04..37683.94 rows=1 width=16)
   Recheck Cond: (pulp_id = ANY ('{17a51ead-67c3-4bb1-9e9d-fa54cddcbb10,d9330405-c55c-419a-bfe5-8c69e12737d6}'::uuid[]))
   Filter: (NOT (hashed SubPlan 1))
   ->  Bitmap Index Scan on core_content_pkey  (cost=0.00..8.86 rows=2 width=0)
         Index Cond: (pulp_id = ANY ('{17a51ead-67c3-4bb1-9e9d-fa54cddcbb10,d9330405-c55c-419a-bfe5-8c69e12737d6}'::uuid[]))
   SubPlan 1
     ->  Nested Loop  (cost=37663.86..37667.18 rows=1 width=16)
           ->  Nested Loop  (cost=37663.43..37666.72 rows=1 width=16)
                 ->  HashAggregate  (cost=37663.01..37663.02 rows=1 width=16)
                       Group Key: u0.pulp_id
                       ->  Nested Loop  (cost=4300.33..37663.00 rows=1 width=16)
                             Join Filter: (u0.version_added_id = u2.pulp_id)
                             ->  Hash Left Join  (cost=4300.33..37592.75 rows=1 width=32)
                                   Hash Cond: (u0.version_removed_id = u3.pulp_id)
                                   Filter: ((u3.number > 240) OR (u3.number IS NULL))
                                   ->  Bitmap Heap Scan on core_repositorycontent u0  (cost=4234.60..37403.04 rows=47232 width=48)
                                         Recheck Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                                         ->  Bitmap Index Scan on core_repositorycontent_repository_id_1946e61b  (cost=0.00..4222.79 rows=47232 width=0)
                                               Index Cond: (repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e'::uuid)
                                   ->  Hash  (cost=43.10..43.10 rows=1810 width=20)
                                         ->  Seq Scan on core_repositoryversion u3  (cost=0.00..43.10 rows=1810 width=20)
                             ->  Seq Scan on core_repositoryversion u2  (cost=0.00..47.62 rows=1810 width=16)
                                   Filter: (number <= 240)
                 ->  Index Scan using core_repositorycontent_pkey on core_repositorycontent v1  (cost=0.43..3.71 rows=1 width=32)
                       Index Cond: (pulp_id = u0.pulp_id)
           ->  Index Only Scan using core_content_pkey on core_content v0  (cost=0.42..0.46 rows=1 width=16)
                 Index Cond: (pulp_id = v1.content_id)
(27 rows)
1 Like

Define “versions”. EPEL 9 is in four CVs, each CV is in two LEs, so a total of 8 “versions” of EPEL 9 are on the content proxy.

The repository 43e7b651-8399-4bda-b54e-900dea17af7e is the first one which gets published and promoted during my daily sync/publish/promotes, i.e. that’s the sync which “introduces” the new content to the proxy. The other ones following after that seem to be o.k. They even overlap and finish before the “big” sync ends. The content proxy is set to “On Demand”.

1 Like

With the reverted patch the sync is working fast again. Yesterday and this morning, it was only a matter of minutes. Here is the output of the pulp task syncing the repo to the content proxy from this morning:

smart_proxy_history_id: 27392
pulp_tasks:
- pulp_href: "/pulp/api/v3/tasks/856bb6ff-ebce-43d2-84ae-6fb25a2cde88/"
  pulp_created: '2023-09-27T02:37:24.115+00:00'
  state: completed
  name: pulp_rpm.app.tasks.synchronizing.synchronize
  logging_cid: d332f87c-0d68-4334-8162-fe554f069da4
  started_at: '2023-09-27T02:37:24.208+00:00'
  finished_at: '2023-09-27T02:45:41.275+00:00'
  worker: "/pulp/api/v3/workers/3a526597-8d15-422a-ac75-eb5dce815e81/"
  child_tasks: []
  progress_reports:
  - message: Downloading Metadata Files
    code: sync.downloading.metadata
    state: completed
    done: 6
  - message: Downloading Artifacts
    code: sync.downloading.artifacts
    state: completed
    done: 0
  - message: Associating Content
    code: associating.content
    state: completed
    done: 322
  - message: Skipping Packages
    code: sync.skipped.packages
    state: completed
    total: 0
    done: 0
  - message: Parsed Packages
    code: sync.parsing.packages
    state: completed
    total: 37768
    done: 37768
  - message: Parsed Comps
    code: sync.parsing.comps
    state: completed
    total: 22
    done: 22
  - message: Parsed Advisories
    code: sync.parsing.advisories
    state: completed
    total: 6432
    done: 6432
  - message: Un-Associating Content
    code: unassociating.content
    state: completed
    done: 4
  created_resources:
  - "/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/versions/241/"
  - "/pulp/api/v3/publications/rpm/rpm/b42a6709-6f5c-4ab0-a89d-9307110fc600/"
  reserved_resources_record:
  - "/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/"
  - shared:/pulp/api/v3/remotes/rpm/rpm/6732d168-717d-47d1-9406-5fbb1fb52efd/
task_groups: []
poll_attempts:
  total: 48
  failed: 0

So it seems to be related to the patch. I guess the new query either didn’t consider some scenario where it could take long and require a lot of sequentical scans or an index is missing?

1 Like

@gvde When @dralley (or other Pulp devs) are asking about number of versions in the repo, they mean number of versions in the underlying Pulp repo. Your previous post actually gives a hint:

  created_resources:
  - "/pulp/api/v3/repositories/rpm/rpm/43e7b651-8399-4bda-b54e-900dea17af7e/versions/241/"

The versions/241/ part suggests that this RPM repo is on its 241st version within Pulp. This does not actually mean there are 241 repo versions, since older versions may have long been deleted, so this is a ceiling rather than a floor. If you do not remove old CV versions, then it is very much possible that Katello needs to keep all 241 versions within the Pulp repository around. Not sure if that is part of the problem though.

1 Like

@quba42 If you tell me the SQL query to check I can do that. My guess is this:

pulpcore=# select * from core_repositoryversion where repository_id = '43e7b651-8399-4bda-b54e-900dea17af7e';
               pulp_id                |         pulp_created          |       pulp_last_updated       | number | complete | base_version_id |            repository_id             | info 
--------------------------------------+-------------------------------+-------------------------------+--------+----------+-----------------+--------------------------------------+------
 30c14099-7e98-41e0-bd97-318f1190731e | 2023-09-25 17:03:04.831039+02 | 2023-09-25 17:08:45.265643+02 |    240 | t        |                 | 43e7b651-8399-4bda-b54e-900dea17af7e | {}
 03eaf1d7-0042-4e0f-9e27-e7aa3d09f88e | 2023-09-27 04:37:24.502317+02 | 2023-09-27 04:45:20.588415+02 |    241 | t        |                 | 43e7b651-8399-4bda-b54e-900dea17af7e | {}
 19c601b1-ede7-4083-810c-ae4a944a16ac | 2022-09-28 15:08:13.525584+02 | 2022-09-28 15:08:13.5256+02   |      0 | t        |                 | 43e7b651-8399-4bda-b54e-900dea17af7e | {}
 374e64bb-f0e4-48db-962b-5fc91f0a98e1 | 2023-09-25 04:35:05.281957+02 | 2023-09-25 08:24:34.560124+02 |    239 | t        |                 | 43e7b651-8399-4bda-b54e-900dea17af7e | {}
 804a78e0-e288-49d2-ada8-00c906ed4306 | 2023-09-22 04:41:25.028418+02 | 2023-09-22 06:36:51.144996+02 |    238 | t        |                 | 43e7b651-8399-4bda-b54e-900dea17af7e | {}
(5 rows)

Is that what you were looking for? @dralley

1 Like

I think this is what @dralley was looking for, and I think “5 rows” proves that the number of repository versions is not the issue here. So yes, you are on your 241st repository version in this Pulp repository, but all but the latest four have been deleted at some point. Apparently the 0th version that does not contain any content has also been kept. Don’t ask me why every Pulp repository comes with a empty 0th repository version. :wink:

1 Like