Because of the EL7 deprecation I am playing around with a new katello 4.4 installation on AlmaLinux 8 using ansible to configure products, repos, etc.
For testing, I have just removed all products and disabled all redhat repositories, removed subscriptions and delete the manifest. So technically, the server should have no content at all.
After that 40G of the 120G content I have had synced before has been deleted. But still there are approx. 80G of content in /var/lib/pulp. I have looked into the database and for instance pulpcore.core_artifact still has 33894 rows. So it looks to me as if there is a lot of orphaned content which is not detected and cleaned up…
That command should create 1 or more tasks labeled Actions::Katello::OrphanCleanup::RemoveOrphans
Could you check please on the WebUI, navigate to Monitor → Tasks and enter into the search bar: label = Actions::Katello::OrphanCleanup::RemoveOrphans and then press enter.
In the results, please locate the task(s) corresponding to the time when the rake script was run, and can you confirm that these have the state stopped with the result success or is there some other combination of state and result?
Hi @gvde . I looked into what this task does. The intent behind it is to remove content which has been synced to external content proxies, which is not published in any CV version that is currently promoted to any Lifecycle Environment assigned to that content proxy.
So for example, if you have an environment path like Library → Devel → QA → Prod, one content proxy providing content for a datacenter which only has Devel and QA, then packages in a CV version that is only promoted to Prod would be removed from that content proxy by this task.
Since the Katello primary server must have all content for all LCEs so that it can be synced out to any content proxies, this should explain why it didn’t seem to do too much in your case.
I believe what you are looking for instead would be the ‘reclaim space’ job you can run for any content proxy, including the internal content proxy to Katello primary. From the WebUI:
Infrastructure → Smart Proxies → Click on the Smart Proxy you wish to clean → Click on the ‘Reclaim Space’ button. It will create a task which you can follow at Monitor → Tasks.
This will cleanup downloaded packages for repositories which have the download policy set to ‘on_demand’, so that they will not become cached and ultimately stored on disk again unless requested by some content host.
I don‘t think you understood the issue I had: it was a single server, no external proxies, added some repos, synced and downloaded content, then deleted all products, repos etc. again. It has nothing to do with on demand downloads.
I have removed repositories that were around 500gb in size, but that doesnt seem to have cleared any space, still using 1.8tb of space when it should have been around 1.3tb
I am running katello 4.8.4 on Alma Linux 8
Check that there are no content views holding on to the content that you want to get rid of. Try publishing a new version of the content view then delete all older versions. Login to the foreman server and on the shell as root run:
Thank you @jruk, what you suggested is what I followed. If there were any products published to a content view, it would not let me delete the products, however in this case I made sure that all those CV that had those products published were gone, then I proceeded to delete all the unnecessary products, however the disk space usage did not decrease, not even after running the foreman-rake command. I am not really sure why it isnt clearing the space.
Hi, does anyone know if there is a fix for this or a workaround to clear space?
I have ran pulp repository list --limit 532 and I see repositories that are no longer visible in katello, running the commands here dont really help to remove all those repositories.
Its now been pegging my CPU at 100% for 7 hours, top shows it is postgresql beign hammered.
Task is stuck at 50%
So, the issues with cleaning up orphaned content and reclaiming space have been raised almost 2 years ago now, in this report, and its been ignored for 2 years, while the functionality has remained hopelessly broken.
Yep, no probs, I stopped foreman services over the weekend, and have now restarted , and the reclaim space job has been restarted so in the dynflow console i can see it is in:
2: Actions::Pulp3::CapsuleContent::ReclaimSpace (waiting for Pulp to finish the task reclaim_space)
Another notice, not fully related to the last posts: I just manually started remove orphan. I have noticed that it takes a couple of hours lately on the main foreman server. Three postgresql processes are running more or less at 100% for hours. The query:
18140 | foreman | 587103 | 18139 | foreman | /usr/bin/sidekiq | | | -1 | 2024-02-12 16:12:04.092907+01 | 2024-02-12 16:12:04.174938+01 | 2024-02-12 16:12:04.175741+01 | 2024-02-12 16:12:04.175741+01 |
| | active | | 397156863 | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms") | client backend
18140 | foreman | 587104 | 18139 | foreman | /usr/bin/sidekiq | | | | 2024-02-12 16:12:04.177155+01 | 2024-02-12 16:12:04.174938+01 | 2024-02-12 16:12:04.175741+01 | 2024-02-12 16:12:04.179351+01 |
| | active | | 397156863 | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms") | parallel worker
18140 | foreman | 587105 | 18139 | foreman | /usr/bin/sidekiq | | | | 2024-02-12 16:12:04.177605+01 | 2024-02-12 16:12:04.174938+01 | 2024-02-12 16:12:04.175741+01 | 2024-02-12 16:12:04.180237+01 |
| | active | | 397156863 | SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms") | parallel worker
Current time is 20:53… Explain:
foreman=# explain SELECT "katello_rpms".* FROM "katello_rpms" WHERE "katello_rpms"."id" NOT IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_rpms") ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Gather (cost=1000.00..12916336526.48 rows=212588 width=709)
Workers Planned: 2
-> Parallel Seq Scan on katello_rpms (cost=0.00..12916314267.68 rows=88578 width=709)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..134234.43 rows=4633095 width=4)
-> Seq Scan on katello_repository_rpms (cost=0.00..92969.95 rows=4633095 width=4)
(7 rows)
If I understand correctly, it actually scans all 88578 rows of katello_rpms and for each it scans all 4633095 rows of katello_repository_rpms…
I was looking at the orphan cleanup workflow and remembered there’a setting:
setting 'orphan_protection_time',
type: :integer,
default: 1440,
full_name: N_('Orphaned Content Protection Time'),
description: N_('Time in minutes before content that is not contained within a repository and has not been accessed is considered orphaned.')
Pulp considers content orphaned after the above 1440 minutes since all repositories containing the content have been deleted. Wonder if a second cleanup task will pick those up for deletion and clean up more space.
Trying to reproduce the issue around some repositories not getting marked for deletion during orphan cleanup.