Hey,
as some of you saw in Do you still use rsync to mirror our repositories?, we’re currently trying to lower our traffic bills by not transferring data we don’t need
One of the takeaways from the above thread was that we transfer the whole Debian dists/
folder on every rsync run, as the folder has a timestamp in its name and is thus considered new by rsync. This timestamp is an artifact of how freight
(our Debian repo management software) works and cannot be disabled by configuration. While I am not a huge freight
fan, I also don’t really want to replace it just now, as it works fine otherwise and people are used to it.
My intent is to make the repository served by http(s) and rsync not to contain the timestamps and symlinks generated by freight
. After a bit of playing around, I came to the following possible solution:
- make
freight
write its repository data not to/var/www/vhosts/deb/htdocs/
directly, but to a different folder on the same partition (probably something like/var/www/vhosts/deb/private/freight/
) - after
freight
has regenerated the repository (viacron
), usersync
to copy (well, actually hardlink, as we don’t want to waste space and time copying) the repository to/var/www/vhosts/deb/htdocs/
:- sync new packages, but don’t delete old (yet):
rsync --archive --copy-links --hard-links --link-dest=/var/www/vhosts/deb/private/freight/pool/ /var/www/vhosts/deb/private/freight/pool/ /var/www/vhosts/deb/htdocs/pool/
- sync new metadata, resolving symlinks (
--copy-links
) and excluding the timestamped folders (--exclude '/*-*/'
:
rsync --archive --delete --copy-links --hard-links --exclude '/*-*/' --exclude '/*/.refs/' --link-dest=/var/www/vhosts/deb/private/freight/dists/ /var/www/vhosts/deb/private/freight/dists/ /var/www/vhosts/deb/htdocs/dists/
- sync packages once more (there should be no new ones at this point) and delete old:
rsync --archive --delete --copy-links --hard-links --link-dest=/var/www/vhosts/deb/private/freight/pool/ /var/www/vhosts/deb/private/freight/pool/ /var/www/vhosts/deb/htdocs/pool/
- sync new packages, but don’t delete old (yet):
As the first round of copying does not delete old packages, clients that use the repository while we sync still have the chance to get the packages based on the old metadata. Then we replace the metadata and can safely remove the files that aren’t referenced anymore.
What do y’all think (esp @mmoll, @Gwmngilfen and @ekohl as I guess you’re most knowledgeable with the current setup)?