Clean up /var/lib/pulp

Problem: The /var/lib/pulp file system on our satellite servers is starting to fill up. We’ve tried various methods to clean it up, but nothing seems to be working. Does anyone have a good practice for maintaining the /var/lib/pulp file system?

Expected outcome: File system does not fill up.

Foreman and Proxy versions: foreman 3.0.101; katello 4.2.1-1

Foreman and Proxy plugin versions:

Distribution and version: CentOS 7.9

Other relevant data:

1 Like

@bradawk : There should be a cron job defined in /etc/cron.d/katello that invokes foreman-rake katello:delete_orphaned_content to clean up orphaned content if that is what’s causing your /var/lib/pulp to fill up. You can also run foreman-rake katello:delete_orphaned_content on your server directly to trigger some cleanup if required.

If that doesn’t solve the issue, the content might still have references in katello/pulp preventing it from getting deleted. Could you provide some more context around your workflow with katello to understand what might be causing unexpected file system usage?

I see the katello job there and also a foreman. I can see foreman kicking off in /var/log/cron, but not katello. One of my associates has manually run the commands in the katello job and reports that it does nothing. Maybe there just isn’t any orphaned content? Most of our repositories are set to immediate download. I wonder if changing that to on_demand would make a significant difference?

I do periodically go through and clean up old content view versions. That does not seem to have any affect either.

I checked and there are no hung pulp workers.

We have a content view associated with each repository which is published to library. We then have a composite content view that is published to the various life cycle environments. The developers want us to be able to strongly control when each life cycle gets patching. When we start a new cycle, we update any content view filters with a date to a new date. Then make sure all repositories are synced and publish a new version of each to library. We then publish a new version of the composite content view to library and then promote it to the base life cycle environment. When all of the servers in that life cycle have been patched, we promote the composite content view to the next life cycle level and patch those servers, etc until it is completed and we start over again.

1 Like

I wonder if changing that to on_demand would make a significant difference?

That would make a difference on new repositories you sync.

Deleting old cv versions should also help in cleaning up older repositories and any content that does get deleted. However, Pulp has a lot of de-duplication in place so the number of CV and CV versions shouldn’t increase disk space usage by a lot however.

Have you noticed any particular task that eats up into the space? Some would be expected like creating and syncing new “immediate” repos. Others like publishing/promoting CVs shouldn’t be adding to the size of /var/lib/pulp as all the downloaded artifacts are just re-used for CVs.

Curious if others in the community have noticed this and have suggestions as well.

Do you follow that up with orphan cleanup?

I don’t think so. Do you mean something like: foreman-rake katello:delete_orphaned_content ?

@bradawk : Yes. That should clean out any unused data in /var/lib/pulp. Cleaning that up and using download on demand should make the space usage a little more manageable.

If you do notice something unusual taking up unexpected space, that would be helpful information to put on this thread.