I have looked into the postgreql logs. They are logging all statements taking longer than 1s. Just counting lines already shows a much larger number:
# wc -l /var/lib/pgsql/data*/log/postgresql-Thu.log
34931 /var/lib/pgsql/data/log/postgresql-Thu.log
12447 /var/lib/pgsql/data-old/log/postgresql-Thu.log
Only counting unique statements in yesterday’s log I already find
SELECT "core_task"."pulp_id", "core_task"."pulp_created", "core_task"."pulp_last_updated", "core_task"."state", "core_task"."name", "core_task"."logging_cid", "core_task"."unblocked_at", "core_task"."started_at", "core_task"."finished_at", "core_task"."error", "core_task"."enc_args", "core_task"."enc_kwargs", "core_task"."worker_id", "core_task"."parent_task_id", "core_task"."task_group_id", "core_task"."reserved_resources_record", "core_task"."pulp_domain_id", "core_task"."versions" FROM "core_task" WHERE "core_task"."state" IN ('waiting', 'running', 'canceling') ORDER BY "core_task"."pulp_created" ASC
SELECT "core_task"."pulp_id", "core_task"."pulp_created", "core_task"."pulp_last_updated", "core_task"."state", "core_task"."name", "core_task"."logging_cid", "core_task"."unblocked_at", "core_task"."started_at", "core_task"."finished_at", "core_task"."error", "core_task"."enc_args", "core_task"."enc_kwargs", "core_task"."worker_id", "core_task"."parent_task_id", "core_task"."task_group_id", "core_task"."reserved_resources_record", "core_task"."pulp_domain_id", "core_task"."versions" FROM "core_task" WHERE ("core_task"."state" IN ('waiting', 'running', 'canceling') AND "core_task"."unblocked_at" IS NOT NULL) ORDER BY "core_task"."pulp_created" ASC
The first one 3566 times, the second one 28835 times, ranging with durations from 1s to 3.7s. Those statements appear during pulp actions, i.e. synchronize, publish, promote, etc.
The old logs only show the first statement for ~10.000 times, but not the second (searching for unblocked_at
).
If I sum up the durations for those statements in last week’s log I get 19495754ms ~ 325 minutes. Yesterday it was 56706922ms ~ 945 minutes. Currently I seem to have 30 postgresql worker processes for database pulpcore running.
Assuming the statements spread ideally over those 30 worker this would mean net 10,8 minutes waiting time last week and net 31,5 minutes waiting time yesterday. That would already explain 20 minutes. And seeing the load in top and pg_stat_activity output I can see that not all 30 workers are used at all times which could explain why in the end it’s more than 60 additional minutes now compared to last week.
Here is a EXPLAIN ANALYZE output from that statement which I just ran. Of course, currently there is no pulpcore operation active, thus it’s faster than 1s but still it is obvious that this statement is expensive as even now it takes 400ms. Now run this 28.000 times during the day…
pulpcore=# explain analyze SELECT "core_task"."pulp_id", "core_task"."pulp_created", "core_task"."pulp_last_updated", "core_task"."state", "core_task"."name", "core_task"."logging_cid", "core_task"."unblocked_at", "core_task"."started_at", "core_task"."finished_at", "core_task"."error", "core_task"."enc_args", "core_task"."enc_kwargs", "core_task"."worker_id", "core_task"."parent_task_id", "core_task"."task_group_id", "core_task"."reserved_resources_record", "core_task"."pulp_domain_id", "core_task"."versions" FROM "core_task" WHERE ("core_task"."state" IN ('waiting', 'running', 'canceling') AND "core_task"."unblocked_at" IS NOT NULL) ORDER BY "core_task"."pulp_created" ASC;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=253248.77..253248.78 rows=1 width=2177) (actual time=400.898..403.104 rows=0 loops=1)
Sort Key: pulp_created
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..253248.76 rows=1 width=2177) (actual time=400.892..403.097 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on core_task (cost=0.00..252248.66 rows=1 width=2177) (actual time=397.755..397.756 rows=0 loops=3)
Filter: ((unblocked_at IS NOT NULL) AND (state = ANY ('{waiting,running,canceling}'::text[])))
Rows Removed by Filter: 795002
Planning Time: 0.130 ms
Execution Time: 403.129 ms
(11 rows)