Orphaned pulp content

Still on

foreman-3.9.1-1.el8.noarch
katello-4.11.1-1.el8.noarch

today it’s already running more than 11 hours. I don’t know why it takes so long. Last week it was 4 hours as the week before. I have only updated the kernel and some other system packages during the week…

I am wondering if using a “SELECT DISTINCT” in the subquery might improve the performance. katello_repository_rpms has 3897698 rows, but only 400642 distinct rpm_ids.

Otherwise, adding an index on "katello_repository_rpms"."rpm_id" should help.

EXISTS is much more efficient:

foreman=# EXPLAIN ANALYZE SELECT "katello_rpms".* FROM "katello_rpms" WHERE NOT EXISTS (SELECT 1 FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."rpm_id" = "katello_rpms"."id" );
                                                                                                QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.43..146888.54 rows=184725 width=715) (actual time=384.882..1942.428 rows=36 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Nested Loop Anti Join  (cost=0.43..127416.04 rows=76969 width=715) (actual time=654.939..1898.840 rows=12 loops=3)
         ->  Parallel Seq Scan on katello_rpms  (cost=0.00..43801.28 rows=166828 width=715) (actual time=0.065..360.130 rows=133559 loops=3)
         ->  Index Only Scan using index_katello_repository_rpms_on_rpm_id_and_repository_id on katello_repository_rpms  (cost=0.43..1.17 rows=18 width=4) (actual time=0.009..0.009 rows=1 loops=400678)
               Index Cond: (rpm_id = katello_rpms.id)
               Heap Fetches: 21276
 Planning Time: 0.675 ms
 Execution Time: 1942.557 ms
(10 rows)

It takes 2 seconds for instead of hours, while the remove orphan query is still currently running…

Wasn’t this issue fixed in katello 4.12 ?

  • Cleanup orphans task generates inefficient queries consuming resources and taking long time to run (#37058, 799b4c0b)

Well, it hasn’t been backported to 4.11 (yet, I guess) so I am still stuck with it…