Distribution and version:
Rocky Linux release 8.8 (Green Obsidian)
Other relevant data:
We noticed that the problem appear to be with these queries that are always running and are very slow.
foreman=# select usename,client_hostname,query_start,state,query from pg_stat_activity where state <> 'idle';
usename | client_hostname | query_start | state | query
-----------+-----------------+-------------------------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
candlepin | | 2023-06-28 12:42:08.937653+01 | idle in transaction | SET SEARCH_PATH TO public, "$user","public"
postgres | | 2023-06-28 12:50:07.984937+01 | active | select usename,client_hostname,query_start,state,query from pg_stat_activity where state <> 'idle';
foreman | | 2023-06-28 10:13:08.664127+01 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms"
WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-28 10:13:08.664127+01 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms"
WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-28 10:13:08.664127+01 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms"
WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
(5 rows)
Our katello_repository_rpms has more than 2.7M lines.
foreman=# SELECT count("katello_repositories"."id") FROM "katello_repositories";
count
-------
747
(1 row)
foreman=# select count(*) from katello_repository_rpms;
count
---------
2756443
(1 row)
Are there any long-running foreman tasks in the system at the time of this long-running query? We can narrow down the source of this query from the task in question. I can think of something like a sync task doing this or orphan cleanup looking to delete rpms not in any repositories anymore.
You’re right, there were some tasks syncing repos in the background.
Now there are no tasks running and no long running query on the database and it still takes almost three minutes to load the list of content views.
The server is basically idle - only two puma processes running.
Just to be clarify, although when that query is not running the interface is still slow I still think the query needs to be rewritten.
The usage of the in clause should be avoided and there’s two of them.
From what I can tell something like:
SELECT kr.*
FROM katello_rpms kr
LEFT JOIN (
SELECT krr.*
FROM katello_repository_rpms krr
INNER JOIN katello_repositories kr ON krr.repository_id = kr.id
) subq ON kr.id = subq.rpm_id
WHERE subq.rpm_id IS NULL
should give the same results and is much faster. I have instances of the original query running for more than 5 hours now (we have a daily sync plan to synchronize repositories) and the newer version takes only a couple of seconds to run.
Unfortunately I do not know enough about katello and rails so I am not able to find the correct place to replace the query…
If someone is able to point me in the right direction I can test it on my instance.
It’s probably the sync operation looking through katello_repository_rpms table to associate rpms to repositories on sync. I will look into the sync query that does the rpm lookup and see if we can optimize that.
Our katello_repository_rpms has more than 2.7M lines.
Is this something you’re expecting? I am thinking these are coming from Content views with lots of versions. Katello duplicates some of this data into repository snapshots for every version and that may create the ~ 2.7 m records in the table. You could try and remove some of the larger unused versions if that’s the case. Will free up that table a little in the meantime…
I think I ran in the same issue, just noticed it in a different way,
repo syncs (in this case the Foreman client repos) are very expensive on the CPU (-> database) since 4.9:
This is a sync of the EL8 client repo, and as you can see the Actions::Katello::Repository::IndexContent action is taking a lot longer now.
Later also noticed the same when doing a build of a CV:
Interestingly for the CV build not all index actions are slow, but some are, and it’s always the index action.
Out of curiosity I also ran the 2 selects on the DB and these look like this for me:
foreman=# select usename,client_hostname,query_start,state,query from pg_stat_activity where state <> 'idle';
usename | client_hostname | query_start | state | query
-----------+-----------------+-------------------------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
candlepin | | 2023-06-29 21:10:29.52387+02 | idle in transaction | SET SEARCH_PATH TO public, "$user","public"
foreman | | 2023-06-29 22:08:59.616078+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-29 22:09:10.555118+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-29 22:09:15.642146+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-29 22:09:10.555118+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-29 22:09:10.555118+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-29 22:08:59.616078+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
postgres | | 2023-06-29 22:41:55.942774+02 | active | select usename,client_hostname,query_start,state,query from pg_stat_activity where state <> 'idle';
foreman | | 2023-06-29 22:08:59.616078+02 | active | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
(9 rows)
foreman=# SELECT count("katello_repositories"."id") FROM "katello_repositories";
count
-------
4724
(1 row)
foreman=# select count(*) from katello_repository_rpms;
count
---------
5065660
(1 row)
foreman=# select usename,client_hostname,query_start,state,query from pg_stat_activity where state <> 'idle';
usename | client_hostname | query_start | state | query
-----------+-----------------+-------------------------------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------
candlepin | | 2023-06-29 21:10:29.52387+02 | idle in transaction | SET SEARCH_PATH TO public, "$user","public"
postgres | | 2023-06-30 16:06:42.834905+02 | active | select usename,client_hostname,query_start,state,query from pg_stat_activity where state <> 'idle';
foreman | | 2023-06-30 15:59:40.243977+02 | active | SELECT "katello_debs".* FROM "katello_debs" WHERE "katello_debs"."id" NOT IN (SELECT "katello_repository_debs"."deb_id" FROM "katello_repository_debs" WHERE "katello_repository_debs"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-30 15:59:40.243977+02 | active | SELECT "katello_debs".* FROM "katello_debs" WHERE "katello_debs"."id" NOT IN (SELECT "katello_repository_debs"."deb_id" FROM "katello_repository_debs" WHERE "katello_repository_debs"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
foreman | | 2023-06-30 15:59:40.243977+02 | active | SELECT "katello_debs".* FROM "katello_debs" WHERE "katello_debs"."id" NOT IN (SELECT "katello_repository_debs"."deb_id" FROM "katello_repository_debs" WHERE "katello_repository_debs"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories"))
(5 rows)
I don’t think the number of lines of katello_repository_rpms is really the issue.
On our main katello instance we have 29 Content Views but only 2 versions each and we have
foreman=# select count(*) from katello_repository_rpms;
count
---------
2756444
(1 row)
On another katello we have “only” 17 CV but each has 5 versions, so we have:
foreman=# select count(*) from katello_repository_rpms;
count
---------
3731712
(1 row)
which is about 1M more.
We updated both instances to katello 4.9 (and foreman 3.7) last weekend and both became very slow.
Yesterday we reverted the snapshot of the secondary one and upgraded it from 4.8.1 to 4.8.2 and it is working as fast as it was before.
So there is really an issue with foreman/katello 3.7/4.9 versions.
Is there anything I can test and/or try to debug this issue?
Thank you.
I have the same Problem.
RedHat 8
Foreman 3.7 / Katello 4.9
and what is see is that following select in the postgresql takes several hours:
SELECT “katello_rpms”.* FROM “katello_rpms” WHERE “katello_rpms”.“id” NOT IN (SELECT “katello_repository_rpms”.“rpm_id” FROM “katello_repository_rpms” WHERE “katello_repository_rpms”.“repository_id” IN (SELECT “katello_repositories”.“id” FROM “katello_repositories”));
all 3 of them seperate are working.
also this 2: SELECT “katello_repository_rpms”.“rpm_id” FROM “katello_repository_rpms” WHERE “katello_repository_rpms”.“repository_id” IN (SELECT “katello_repositories”.“id” FROM “katello_repositories”)
is there a index missing or wrong?
is this select statement new?
I think I found the issue with slowness. We have a new commit to remove orphaned content units, rpms in this case which looks at all rpms in the system and makes sure all of them are tied to some repository.
I will file an issue to optimize this operation in 4.9.
I can confirm that after commenting line 938 on repository.rb ( /usr/share/gems/gems/katello-4.9.0/app/models/katello/repository.rb ) and then run foreman-installer synchronizing repos and publishing Content Views works again.
Acessing https://foreman.mydomain.com/content_views is still slow, so this is probably not the only issue on katello 4.9.
@sajha is it safe to remove that line? Or am I causing another issues somewhere else?
When is the new commit going to be available? I’m struggling with repositories synchronization, a simple repo that before the updating to Katello 4.9, only took 4 min, it taking around 5 hours.
And the tasks seem stuck at the IndexContent step on dynflow console.
From what i can tell this workaround solves the sync of the repositories and publishing new content views.
Until now we did not had any other issue, but if possible wait for confirmation that this won’t break anything else.
I’ve done the workaround, and it improved the syncing, but it’s still taking a lot of time, the following tasks should have ended after at least 40 minutes, and it seems that some tasks have not even been started yet.
It should be safe to remove that line for now. It does have a side effect where deleted content that was in CV filters might still apply or something of the sort. But feel free to prioritize repo sync speed over the orphan deletion till a fix is released for the issue above. You should be able to run orphan:delete rake task in the background if needed.
Did the same workaround now, thanks for the hint, it brought the indexing times down to the previously normal ms to s (which means repo syncs and CV builds are fast again)
Thanks sajha for the ack that this doesn’t break other stuff that might not be reverseable!
I also can confirm that publishing repositories is no longer an issue, but trying to list content views is still very slow.
Even when using hammer it takes several minutes and sometimes it timeouts.
[PRD root@updates ~ - 16:40:07][1][130]# time hammer content-view list --organization-label MyCompany --fields "Content View ID" --noncomposite 1
Error: Timed out reading data from server
real 4m2.367s
user 0m1.658s
sys 0m0.259s
Whenever we need to make some change on a Content View we’re making queries to the database to get the Content Views IDs and then go directly to the url - example: https://foreman.mycompany.com/content_views/<id>#/versions