Orphaned pulp content

Because of the EL7 deprecation I am playing around with a new katello 4.4 installation on AlmaLinux 8 using ansible to configure products, repos, etc.

For testing, I have just removed all products and disabled all redhat repositories, removed subscriptions and delete the manifest. So technically, the server should have no content at all.

I also ran

foreman-rake katello:delete_orphaned_content RAILS_ENV=production

to clean up.

After that 40G of the 120G content I have had synced before has been deleted. But still there are approx. 80G of content in /var/lib/pulp. I have looked into the database and for instance pulpcore.core_artifact still has 33894 rows. So it looks to me as if there is a lot of orphaned content which is not detected and cleaned up…

Hi @gvde ,

That command should create 1 or more tasks labeled Actions::Katello::OrphanCleanup::RemoveOrphans

Could you check please on the WebUI, navigate to Monitor → Tasks and enter into the search bar: label = Actions::Katello::OrphanCleanup::RemoveOrphans and then press enter.

In the results, please locate the task(s) corresponding to the time when the rake script was run, and can you confirm that these have the state stopped with the result success or is there some other combination of state and result?

Kind regards,

Yes. One, as it’s only the main server at the moment, no external content proxies, yet.

Yes, they are all stopped - success…

Hi @gvde . I looked into what this task does. The intent behind it is to remove content which has been synced to external content proxies, which is not published in any CV version that is currently promoted to any Lifecycle Environment assigned to that content proxy.

So for example, if you have an environment path like Library → Devel → QA → Prod, one content proxy providing content for a datacenter which only has Devel and QA, then packages in a CV version that is only promoted to Prod would be removed from that content proxy by this task.

Since the Katello primary server must have all content for all LCEs so that it can be synced out to any content proxies, this should explain why it didn’t seem to do too much in your case.

I believe what you are looking for instead would be the ‘reclaim space’ job you can run for any content proxy, including the internal content proxy to Katello primary. From the WebUI:

Infrastructure → Smart Proxies → Click on the Smart Proxy you wish to clean → Click on the ‘Reclaim Space’ button. It will create a task which you can follow at Monitor → Tasks.

This will cleanup downloaded packages for repositories which have the download policy set to ‘on_demand’, so that they will not become cached and ultimately stored on disk again unless requested by some content host.

I don‘t think you understood the issue I had: it was a single server, no external proxies, added some repos, synced and downloaded content, then deleted all products, repos etc. again. It has nothing to do with on demand downloads.


Chris here taking over the issue, I will make a redmine issue to make sure our cleanup script is working correctly.

1 Like