I wanted to understand the updated mirroring policies in Katello 4.3+. I found Katello 4.3+ mirroring policy which was useful.
In that thread, the user has gone with the following:
So I have reset all my recently added repos which I have set on “Complete Mirroring” back to “Content Only” leaving me only the EPEL repos with Additive and the RedHat repos with Complete.
If I understand correctly, EPEL removes old packages when an update is released. Historically, I used ‘mirror on sync’ to ‘false’ for EPEL - the idea being that I had sufficient space/bandwidth to store the packages. I was happy to keep the old packages in the repo - that way if I had a requirement to rebuild a server I know I could download the same version of the package if I needed it. So I think ‘Additive’ is the equivalent policy for Katello 4.3+
However, I’m still not clear about mirror_complete and mirror_content_only. Does someone have an example that explains when I should choose one of these over the other? For example, a repo such as HPE Software Delivery Repository contains many versions of the same package - so unlike EPEL I think I can mirror it (mirror on sync = true on older versions). How do I chose between mirror_complete or mirror_content_only - what attributes in the repo am I looking for?
I think the “history” aspect (of preserving old RPMs) is relevant; there’s also an aspect of incompatible content, where some repos publish drpms (for example) and some don’t. Strange pulp3 error - #19 by mhjacks is one of the threads that preceded that change in Katello to deal with the different potential challenges there.
So if I understand right, “Content Only” by default gets RPMs, comps, and bootable tree stuff but ignores some of the potentially problematic content (which would also affect metadata); “Complete Mirroring” pulls everything in the repo including metadata.
Choose mirror_complete if you want a repo that is as close to being a 100% copy of the original as possible - especially if you want yum/dnf to enable repo_gpgcheck, which verifies a signature of the repository metadata.
If you want exactly the same content as the original repo, but you don’t need the metadata to be 100% identical, choose mirror_content_only. This can save some disk space for a couple of reasons and it’s more tolerant of weird metadata.
If you want to collect packages over time, e.g. to retain older versions of packages in repositories where the older versions are regularly purged from the source, use additive.
I think you’ve answered the questions I had quite nicely. The only slight thing which I’m not clear on is with “mirror_content_only”. Initially, I was thinking that for an RPM repo, this would just be the RPM packages. In my mind I think of ‘content’ and ‘metadata’ as two different things so you wouldn’t be mirroring the latter. However, reading your comments it sounds like it may also take some metadata such as errata via updateinfo would be pulled in with “mirror_content_only”.
Taking EPEL-8 as a hypothetical example, would all of these be mirrored with “mirror_content_only” ?
The literal metadata files themselves would not be mirrored. But the repository metadata, in the abstract sense, would be “mirrored” in the sense that whatever you had in the repository previously (if you had uploaded your own packages or synced with additive mode or something) would be replaced with the set of content present in the remote being mirrored.
So if you were to re-publish the resulting repo, it the metadata files wouldn’t be bitwise-identical to the original ones because the packages could be ordered differently, they might be in different locations, stuff like that - but in effect it should behave as the same repository.