Publish and Promote slow with 3.11/4.13

@dralley @ggainey : Is this something we can back-port to 3.49?

Indexes can’t be backported, no.

Some of the query improvements we found were backported, though. And I believe Katello merged something to start using the task purge behavior. With less historical task records you don’t run into the scaling issues, so that was a valid workaround. I have no idea precisely which versions of Katello / Satellite have that in place though.

The task purge during orphan cleanup landed in Katello 4.13. Bug #37521: Pulp never purge the completed tasks - Katello - Foreman

We ran the task purge on gvde’s environment and it reduced the number of tasks and improved the performance but the tasks seem to still be considerably slow even with the lower number of tasks.

It would only this migration: pulpcore/pulpcore/app/migrations/0118_task_core_task_unblock_2276a4_idx_and_more.py at main ¡ pulp/pulpcore ¡ GitHub

The dependency 0117 is already in 3.49. 0118 would just have to be pushed into 3.49. As there is nothing else in between, that should be possible?

Otherwise, I would manually add those indices. And just have to remember to remove them before I upgrade Katello with a pulpcore version 3.54 or later.

Or I manually insert a row into django_migrations to register the migration. I suppose that should prevent another execution of that migration when it is time?

Either way: the state index significantly improves performance beyond what it was before the 3.49 upgrade and decreases the server load. Thus, I would really like to see that in there sooner than later.

How aggressively is Katello purging tasks? Is it possible to purge a little more aggressively?

It’s theoretically possible to backport the migration in cases where it’s the very next one and doesn’t break continuity, but we really, really don’t like doing it.

On my server, the purge reduced the core_tasks table from 2385487 to 96096 (post 15 above) with little performance gain. See the table in post 17 above, e.g. for the last cv “AlmaLinux 9 & EPEL 9” it went from 06:08 to 17:22 with the upgrade to katello 4.13/pulpcore 3.49. The purge reduced time to 15:28, still much higher than with 4.12. I guess, that with the state index this will go down to less than 2 minutes.

The measurements for a copied cv during offload time is in post 20. Times went down to 1/5-1/10.

@dralley… For context, our task purging is now part of orphan cleanup cron job which by default runs every week. However, users can modify the cron and have it run per their schedule.

There’s a setting called ‘completed_pulp_task_protection_days’ which is set to 30 days by default. When purge runs, we use the setting to limit completed tasks to purge.

I have manually added the migration to the pulpcore database. This is what core_task looked before:

pulpcore=# \d core_task
                               Table "public.core_task"
          Column           |           Type           | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
 pulp_id                   | uuid                     |           | not null |
 pulp_created              | timestamp with time zone |           | not null |
 pulp_last_updated         | timestamp with time zone |           |          |
 state                     | text                     |           | not null |
 name                      | text                     |           | not null |
 started_at                | timestamp with time zone |           |          |
 finished_at               | timestamp with time zone |           |          |
 error                     | jsonb                    |           |          |
 worker_id                 | uuid                     |           |          |
 parent_task_id            | uuid                     |           |          |
 task_group_id             | uuid                     |           |          |
 logging_cid               | text                     |           | not null |
 reserved_resources_record | text[]                   |           |          |
 pulp_domain_id            | uuid                     |           | not null |
 versions                  | hstore                   |           | not null |
 enc_args                  | jsonb                    |           |          |
 enc_kwargs                | jsonb                    |           |          |
 unblocked_at              | timestamp with time zone |           |          |
Indexes:
    "core_task_pkey" PRIMARY KEY, btree (pulp_id)
    "core_task_logging_cid_0bc78a42" btree (logging_cid)
    "core_task_parent_task_id_07cf4230" btree (parent_task_id)
    "core_task_pulp_cr_10223f_idx" btree (pulp_created)
    "core_task_pulp_domain_id_ad2d9a92" btree (pulp_domain_id)
    "core_task_task_group_id_a45c142c" btree (task_group_id)
    "core_task_worker_id_ca31e694" btree (worker_id)
Foreign-key constraints:
    "core_task_parent_task_id_07cf4230_fk_core_task_pulp_id" FOREIGN KEY (parent_task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    "core_task_pulp_domain_id_ad2d9a92_fk_core_domain_pulp_id" FOREIGN KEY (pulp_domain_id) REFERENCES core_domain(pulp_id) DEFERRABLE INITIALLY DEFERRED
    "core_task_task_group_id_a45c142c_fk_core_taskgroup_pulp_id" FOREIGN KEY (task_group_id) REFERENCES core_taskgroup(pulp_id) DEFERRABLE INITIALLY DEFERRED
    "core_task_worker_id_ca31e694_fk_core_worker__id" FOREIGN KEY (worker_id) REFERENCES core_worker(pulp_id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "ansible_collectionimport" CONSTRAINT "ansible_collectionimport_task_id_9d08fd06_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_createdresource" CONSTRAINT "core_createdresource_task_id_acf70fb7_fk_core_task__id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_export" CONSTRAINT "core_export_task_id_4947760b_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_import" CONSTRAINT "core_import_task_id_b927da56_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_progressreport" CONSTRAINT "core_progressreport_task_id_0c3fbc3b_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_task" CONSTRAINT "core_task_parent_task_id_07cf4230_fk_core_task_pulp_id" FOREIGN KEY (parent_task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_taskschedule" CONSTRAINT "core_taskschedule_last_task_id_5c1ee058_fk_core_task_pulp_id" FOREIGN KEY (last_task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED

Then I ran:

CREATE INDEX "core_task_unblock_2276a4_idx" ON "core_task" ( "unblocked_at" );
CREATE INDEX "core_task_state_61f0ca_idx" ON "core_task" ( "state" );
CREATE INDEX "core_task_state_3742f2_idx" ON "core_task" ( "state", "pulp_created" );
INSERT INTO "django_migrations" ( app, name, applied ) VALUES ( 'core', '0117_task_unblocked_at', now() );

And then it’s like this:

pulpcore=# CREATE INDEX "core_task_unblock_2276a4_idx" ON "core_task" ( "unblocked_at" );
CREATE INDEX
pulpcore=# CREATE INDEX "core_task_state_61f0ca_idx" ON "core_task" ( "state" );
CREATE INDEX
pulpcore=# CREATE INDEX "core_task_state_3742f2_idx" ON "core_task" ( "state", "pulp_created" );
CREATE INDEX
pulpcore=# INSERT INTO "django_migrations" ( app, name, applied ) VALUES ( 'core', '0117_task_unblocked_at', now() );
INSERT 0 1
pulpcore=# \d core_task
                               Table "public.core_task"
          Column           |           Type           | Collation | Nullable | Default 
---------------------------+--------------------------+-----------+----------+---------
 pulp_id                   | uuid                     |           | not null | 
 pulp_created              | timestamp with time zone |           | not null | 
 pulp_last_updated         | timestamp with time zone |           |          | 
 state                     | text                     |           | not null | 
 name                      | text                     |           | not null | 
 started_at                | timestamp with time zone |           |          | 
 finished_at               | timestamp with time zone |           |          | 
 error                     | jsonb                    |           |          | 
 worker_id                 | uuid                     |           |          | 
 parent_task_id            | uuid                     |           |          | 
 task_group_id             | uuid                     |           |          | 
 logging_cid               | text                     |           | not null | 
 reserved_resources_record | text[]                   |           |          | 
 pulp_domain_id            | uuid                     |           | not null | 
 versions                  | hstore                   |           | not null | 
 enc_args                  | jsonb                    |           |          | 
 enc_kwargs                | jsonb                    |           |          | 
 unblocked_at              | timestamp with time zone |           |          | 
Indexes:
    "core_task_pkey" PRIMARY KEY, btree (pulp_id)
    "core_task_logging_cid_0bc78a42" btree (logging_cid)
    "core_task_parent_task_id_07cf4230" btree (parent_task_id)
    "core_task_pulp_cr_10223f_idx" btree (pulp_created)
    "core_task_pulp_domain_id_ad2d9a92" btree (pulp_domain_id)
    "core_task_state_3742f2_idx" btree (state, pulp_created)
    "core_task_state_61f0ca_idx" btree (state)
    "core_task_task_group_id_a45c142c" btree (task_group_id)
    "core_task_unblock_2276a4_idx" btree (unblocked_at)
    "core_task_worker_id_ca31e694" btree (worker_id)
Foreign-key constraints:
    "core_task_parent_task_id_07cf4230_fk_core_task_pulp_id" FOREIGN KEY (parent_task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    "core_task_pulp_domain_id_ad2d9a92_fk_core_domain_pulp_id" FOREIGN KEY (pulp_domain_id) REFERENCES core_domain(pulp_id) DEFERRABLE INITIALLY DEFERRED
    "core_task_task_group_id_a45c142c_fk_core_taskgroup_pulp_id" FOREIGN KEY (task_group_id) REFERENCES core_taskgroup(pulp_id) DEFERRABLE INITIALLY DEFERRED
    "core_task_worker_id_ca31e694_fk_core_worker__id" FOREIGN KEY (worker_id) REFERENCES core_worker(pulp_id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "ansible_collectionimport" CONSTRAINT "ansible_collectionimport_task_id_9d08fd06_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_createdresource" CONSTRAINT "core_createdresource_task_id_acf70fb7_fk_core_task__id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_export" CONSTRAINT "core_export_task_id_4947760b_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_import" CONSTRAINT "core_import_task_id_b927da56_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_progressreport" CONSTRAINT "core_progressreport_task_id_0c3fbc3b_fk_core_task_pulp_id" FOREIGN KEY (task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_task" CONSTRAINT "core_task_parent_task_id_07cf4230_fk_core_task_pulp_id" FOREIGN KEY (parent_task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED
    TABLE "core_taskschedule" CONSTRAINT "core_taskschedule_last_task_id_5c1ee058_fk_core_task_pulp_id" FOREIGN KEY (last_task_id) REFERENCES core_task(pulp_id) DEFERRABLE INITIALLY DEFERRED

I hope that’s everything pulpcore/pulpcore/app/migrations/0118_task_core_task_unblock_2276a4_idx_and_more.py at main · pulp/pulpcore · GitHub is really doing. And if I am not mistaken adding this to django_migrations will prevent it from being executed again once it makes it into the current version.

I’ll see on Monday morning how fast my weekday’s cronjob will do.

O.K. Here the total execution time of my work-daily cron job (6x CV promote, then publish and promote, and a purge).

It always starts at 03:53.

Without those indices the script finished at 05:58, 05:59, 05:57 on Wednesday through Friday last week.

This morning, with those indices in place, the script finished at 04:26, i.e. 0.5 hours instead of 2 hours. That’s even much better than before the update to 4.13, when it finished after approx. one hour.

So those indices improve performance beyond the performance with 4.12. I really hope that this is incentive enough to backport the index into either pulpcore or maybe only katello.

It also reduces the number of long running queries in the postgresql logs from ~50.000 lines down to a few hundred.

2 Likes

Pulp won’t backport the indexes, but we can do it in a patch for Katello 4.14. We got lucky that the migration is the first one that’ll run after the next Pulp upgrade – if that wasn’t the case we wouldn’t be able to backport the migration at all due to how Django works.

1 Like

Excellent. That was what I was hoping.

Do you know if django is doing anything beyond those four sql statements which I manually added:

CREATE INDEX "core_task_unblock_2276a4_idx" ON "core_task" ( "unblocked_at" );
CREATE INDEX "core_task_state_61f0ca_idx" ON "core_task" ( "state" );
CREATE INDEX "core_task_state_3742f2_idx" ON "core_task" ( "state", "pulp_created" );
INSERT INTO "django_migrations" ( app, name, applied ) VALUES ( 'core', '0117_task_unblocked_at', now() );

Does it store the state of migrations elsewhere, too?

Django mainly records what migrations it applied. It’s the migrations themselves (in the source code) that imply a strict order they need to be applied in. If a migration was somehow applied out of order, Django will simply refuse to apply any further migrations. Also, if your migrations don’t imply a strict order in the source code Django will also refuse to apply any further migrations.

If none of that is the case, but you have already manually applied the next migration without using Django, Django will not have recorded applying that migration, and will try to apply it. If the DB accepts that as a noop (because it already happened) and returns a success status, Django will be happy and continue applying all remaining migrations.

Thanks. But is the django_migrations table the only place where django records migrations or does it have additional places? I have only found django_migrations and it looks like it records the migration, thus I was hoping my four sql statements are the equivalent of what django does for migration.

I am pretty sure django_migrations is the only place. As sure as one can be of a negative.

Well, you definitively know much more about django than I do and all I did was some database reverse engineering. I’ll guess, I trust in my changes and see what happens. I hope, it doesn’t break the katello backport, either. But if django doesn’t like it, it seems it would abort the migrations thus I could always undo my changes and then it should go through again…

What is totally unclear to me at this point is how django makes sure to maintain the complete chain of migrations. It has dependencies but as far as I can see it would only note a missing migration at a future time when there is a migration which depends on the missing migration… Well I am probably overthinking it.

Either way, it made me look into the database again and of course, I did a mistake: I have registered the 0117 migration a second time instead of the 0118 migration.

I would edit my previous posts if I could, but anyway, the correct sql statements are

CREATE INDEX "core_task_unblock_2276a4_idx" ON "core_task" ( "unblocked_at" );
CREATE INDEX "core_task_state_61f0ca_idx" ON "core_task" ( "state" );
CREATE INDEX "core_task_state_3742f2_idx" ON "core_task" ( "state", "pulp_created" );
INSERT INTO "django_migrations" ( app, name, applied ) VALUES ( 'core', '0118_task_core_task_unblock_2276a4_idx_and_more', now() );

Now this should really be what pulpcore/pulpcore/app/migrations/0118_task_core_task_unblock_2276a4_idx_and_more.py at main ¡ pulp/pulpcore ¡ GitHub does.

If someone already inserted 0117 you can easily change it using the row id. And I guess it should even matter if it was found twice in the table.

Having read your commands again, it looks like you are also marking the django migration as applied. In that case, Django will think it already applied this migration and simply go on to the next one (if there are any). This should be completely compatible with a Katello backport that adds this migration. Django will just be like: “yeah I already applied that, so I will do nothing and be happy”.

That sounds right to me. Everything happens at the point in time when you (or the foreman installer or whatever) tells Django to apply the migrations. At this point Django will check what migrations it already applied and also check the dependency definitions within the actual migrations as found in the currently installed Python sources. If it finds any kind of mismatch, or does not like the dependency tree in the Python sources it will throw an error and do nothing. If it finds migrations it has not yet applied that come after the migrations it already applied, it will apply them (and record having applied them).

If at any point in the future a new version of Pulp is installed with new migrations, and somebody tells it to migrate, same thing again. So future versions of Pulp must not ship migrations such that Django has already applied one of those migrations but out of the order defined in that latest set of shipped migrations. Hence, no backports.

Strictly speaking backporting the very next migration is fine, cause you are not changing order. The main reason Pulp does not do that either is because Pulp users are not used to the idea they may need to apply migrations for a Z release upgrade. Forgetting to apply migrations that are in the sources is no good either. Also it would be pretty hard to communicate: “If the backport you want, happens to include the very next migration, we’ll backport it for you, otherwise no”. Much easier to understand: “We don’t backport DB migrations.”

Thanks for the explanation. So I guess, I am on the safe side and when the pulpcore upgrade comes, it’ll skip the migration I have already did. Until then, I enjoy the great speedup.

I’ll take snapshots before any upgrade anyway, so if it all falls apart because I did something wrong, I can always revert and remove my changes and try the upgrade again.

Update on this: We are in the process of getting the migration in as a patch for 3.49 pulpcore packages. Once this is merged: Add task indices to pulpcore 3.49 by sjha4 ¡ Pull Request #1144 ¡ theforeman/pulpcore-packaging ¡ GitHub , the patch should be available with 4.13 which uses pulpcore 3.49.

1 Like