Completely remove or move previous packaging branches to an archive repository in order to reduce the overall size and thus time it takes to clone foreman-packaging.
More specifically create a foreman-packaging-archive repository and then delete all of the branches older than rpm/3.0 and deb/3.0 from the primary foreman-packaging repository.
Why and What
Our CI system clones the foreman-packaging repository for every RPM and Debian PR and then again whenever we merge a PR to build a package.
The current size of the repository: 597M foreman-packaging/
This, for example, takes about 25 seconds on my local machine.
This implies that if we reduce the size of the repository and thereby decrease the time it takes to clone we will gain faster jobs within our CI that can add up given how much we do in this area.
Iβm not sure this is a good idea. Most of the history is shared anyway. All large files (mostly the nodejs tarballs) are in rpm/develop and thatβs part of the history. Git reuses object files so youβd win very little.
When you only clone a single branch and no depth you still have a large repository. For rpm/develop:
Good call out on how we might optimize our CI fetching if we are not already. Let me ask two additional questions:
If we step back from the CI question: do we need all these old branches? Are they providing value existing in the main repository? I find that I encounter some overhead looking through all the branches either locally or on Github to find the most important and relevant ones.
Would there be value in moving to a model to store those nodejs tarballs on one of our webservers or koji? And fetch them like any other source.
Where we package it directly (like nodejs-theforeman-vendor) it properly uses git-annex and we have no additional overhead.
We could get rid of the large tarballs if we can find a way to have bundled NPM packages without actually calling NPM or make NPM behave properly without needing to reach out to the registry. Perhaps npm ci during the packaging phase could help?
Another alternative is to remove the bundled NPM packages, but that may be more controversial. With better automation it could perhaps get done.
If we canβt, finding some large file offloading mechanism would surely help reduce the repository size growth.
Or a short summary: there are a content-v2 and an index-v2 directories. There are 770 files for this particular install and it takes 25M uncompressed, 6.7M compressed
# du -sh cache/
25M cache/
# find cache/ -type f | wc -l
770
# tar --create --gzip --file cache.tar.gz cache
# ls -sh cache.tar.gz
6.7M cache.tar.gz
The tricky thing is that every single tarball also contains the gzipped tarball that is the actual package, which we also already have in the sources section. We are essentially duplicating all sources.
One possible strategy is to create a more complex specfile where we reuse the git annexed sources and extract them to the cache at the right place. The good thing is that the filename is the content of the file:
Cache size before (compressed)
6.7M nodejs-theforeman-builder-10.1.7-registry.npmjs.org.tgz
Cache size before (uncompressed)
16M /tmp/tmp.J9UjgTdVJF
Cache size after (uncompressed)
13M /tmp/tmp.J9UjgTdVJF
Cache size after (compressed)
4.6M nodejs-theforeman-builder-new-registry.npmjs.org.tgz
So we see a ~2 MB or about 31% reduction for this package.
The rest is simply because the API responses are very large and I donβt see a way to reduce those with my limited NPM knowledge.
Still, this is a nice reduction and Iβll see about coming up with a patch to npm2rpm. After that I think it wouldnβt be wise to replace all existing packages in our tree, but rather see if we can do version bumps to get the reduced filesize. While weβre at it, we should check if there are no unused packages that we still have in our repository.