Katello backup is failing on Pulp content

Problem:
Hello,

We have Katello backup is failing on pulp content. When trying to backup Pulp content, backup script is attempting to tar up directories inside var/lib/pulp/content/.snapshot/ that do not exist. I’m trying to run full katello backup.

Expected outcome:

katello backup sucessful.

Foreman and Proxy versions:

Foreman and Proxy 1.16.0

Foreman and Proxy plugin versions:

bastion 6.1.5
foreman-tasks 0.10.9
foreman_docker 3.2.1
foreman_remote_execution 1.3.7
foreman_templates 5.0.1
foreman_virt_who_configure 0.1.5
katelllo 3.5.1.1

Other relevant data:
exact tar error:

Failed ‘tar --selinux --create --file=/var/backup/katello/katello-backup/katello-backup-20180731110243/pulp_data.tar --exclude=var/lib/pulp/katello-export --listed-incremental=/var/backup/katello/katello-backup/katello-backup-20180731110243/.pulp.snar --transform ‘s,^,var/lib/pulp/,S’ -S *’ with exit code 2

[e.g. logs from Foreman and/or the Proxy, modified templates, commands issued, etc]

Hello,
could you share with us the parameters you used for the backup script and
possibly the full log? From what you provided it seems you are using
snapshot type of backup. It seems the snapshots were not created. Can you
find any other error explaining what happened during the snapshot creation?
Most common case is insufficient space on the disk. It would be also good
to know if it worked before and suddenly started to fail or if it is a
first attempt to setup the backups.

Regards,
Martin

Hi Martin,

The command that was used was katello-backup /var/backup/katello/katello-backup/ -y

The backup was running for extremely long time, about 30 hours (yes, 30 hours)

Can you specify which log file do you want?

The directory to which Katello is backing up has enough space.

Looking at the snaphot directory, it appears that there are 3 daily snaphots, 6 “every4hours” snapshots and 2 snapmirrors. It appears that daily snapshots and 4 hour snaphots are automatically rotated.

That is one of the things that I noticed. Right now pulp content directory is using about 350G, yet pulp_data.tar file was growing to over 1TB. We still had enough space, but it does not appear to be right to me. Am I wrong?

Upon further thinking I was wondering if this is what happened: Backup started on 07/31. On that date, there were daily snapshots for 07/29, 07/30 and 07/31. As it progressed to 08/01 something automatically rotated snaphots - creating snaphot for 08/01 and deleteing 07/29 snaphot. I specifically remember seeing tar error messages for missing files in daily 07-29 snaphot directory and no seeing that directory in /var/lib/pulp/content/.snapshot So tar essentially fails because it cant get files.

Does this scenario seems possible to you?

If this is what happened, how can we tweak pulp snapshot management? Can we remove them altogether?

thanks,

Albert Sheynkman

Thanks for the details. By a log I meant output of the backup script btw.

Is it possible to compare the size of the tar and time spent with previous runs or was this first attempt?

Regarding the tar size - tar is not very effective with small files. While pulp content usually contains huge number of symlinks the size difference seems possible. I did a small test - created a directory with a 2 byte file and a link to it (8 bytes). The tarball of this directory had 10,240 bytes.

The scenario you suggest seems possible. AFAIK tar does a file scan at the beginning and then follows the result during archiving the data. It seems probable that when the scan contained the snapshots which were removed before tar was able to store them it failed with missing source data. I’m not familiar with how the Pulp snapshots work but as you describe it seems that some cron job is managing it periodically. Is that a default Katello setup or a custom Pulp tuning? For a reason explained above this job should be stopped during the backup. Another option would be to perform the backup with --skip-pulp-content and archive the Pulp content manualy with optimization for your usecase (e.g. .snapshots skipped, --ignore-failed-read).

HTH,
Martin

I realized that with one file it is not very conclusive test. Re-tested with 10000 empty files with links to them and the tarball has 10MB. Somewhere I found the metadata for each file takes 512B which seems to match my findings. Any idea how many file objects is in your /var/lib/pulp?

Hi Martin,

I don’t have log output yet because we are running it weekly, but I will next week. Normally we just run katello-backup -y command. Does it log somewhere by default? If yes, let me know where and I will be glad to get you log.

I checked, we do not have any cron jobs that would create snaphsots. We did not make any Pulp customizations by ourselves. So everything pulp is doing - its doing on its own. I do have a question: I thought katello-backup command was suppose to shut down Katello services prior to backup. Do those include Pulp? should we just shut down pulp manually prior to backups?

I have no problems reconfiguring backups, so that katello-backup would backup everything except Pulp data and then just using tar and exclude .snaphot directory. My question is this: If i exclude .snaphot directory, and then I have to restore, would that work? Because then Pulp, or some DB might contain reference to snaphot directories that are no longer present.

thanks,

Albert Sheynkman

Katello-backup does not use logs it just prints to STDOUT so redirecting output to a file should be sufficient.

katello-backup -y <backup_dir> 2>&1 |tee /tmp/backup_output.txt

Katello-backup already stops Pulp services for you which is why I suggested to check the cron. The snapshots is a mystery to me. I’ve never heard of such a thing being part of Katello installation. Hopefully someone more familiar with @katello or @pulp can put some light into it.

To change the backup strategy we should find out what is the /var/lib/pulp/content/.snapshot directory used for and what is updating its content. Are there any traces in system log? Checking the running processes when the snapshots are in progress may also reveal something.

Hi Albert,
we are not aware that anything in the foreman+katello+pulp stack would create a .snapshots directory within /var/lib/pulp/content and none of the instances I’ve seen did that.

Could you find out what filesystem is used for /var/lib/pulp/content? Also could you share with us if any systemd timers are enabled (systemctl list-timers)?