With katello 4.3 (I think) mirror_on_sync has been deprecated and replaced by the mirroring_policy:
--mirroring-policy ENUM Policy to set for mirroring content. Must be one of additive.
Possible value(s): 'additive', 'mirror_complete', 'mirror_content_only'
This is to handle some sync issues before, e.g. with EPEL repositories.
As far as I can tell, all repositories with mirror_on_sync=true will get mirroring_policy mirror_content_only the others get “additive”.
Reading the docs for the mirroring_policy it seems to me that “mirror_complete” would be preferable (faster) for yum repositories. So I was thinking to set the mirroring_policy to “mirror_complete” for all my yum repositories (including EPEL etc. which probably have issues with that) and then check the sync status to see if any throws exceptions during sync and then change it for those repositories with errors. That way I would get the fastest sync possible for all yum repositories which support it.
Would this work?
On a sidenote (beyond the bogus Must be one of additive in the description for the option): the default mirroring policy in the gui for a new repository is “Additive” simply because it’s the first in the list/alphabet. I think many people starting with foreman/katello wouldn’t spend much thought on the best selection there and just take what’s offered. I think it would be good if it would offer “Complete Mirroring” as default there for a new repository. I think that would be a more reasonable default…
Yeah this seems like a typo / error in the description. I agree that it might make more sense if this part was removed.
Anyway, the difference between mirror_complete and mirror_content_only is that the former automatically creates a publication with the exact metadata files that Pulp downloaded, which makes a separate publication step unnecessary. This can be faster but it’s not a pure win. The main upside is that it keeps the metadata signatures intact because it’s the same file, so you can use repo_gpgcheck without setting up a signing service.
There are some downsides too. There are situations where it’s not supported:
The repo being synced contains deltarpms (like EPEL) - because pulp doesn’t support them
The metadata in the repo being synced points to files outside of the repo - because it would either break or defeat the purpose of having a local mirror in the first place
And: it might need to download extra files such as the sqlite metadata, which other modes don’t require. This uses a bit more bandwidth and storage if the alternative publish settings don’t require those files. So depending on your network bandwidth the extra downloads might offset part of the benefit of skipping the publication creation.
So in general mirror_content_only or additive are the best defaults. You can opt into mirror_complete if you can and if it would provide benefits in that situation.
Bandwidth and space isn’t a problem for me. But I see another advantage of “Content Only”: it reorganizes the repositories shown through /pulp/content/ into a consistent layout using Packages/x/… etc. Some external repositories have a really weird structure and some are simply plain flat (like the foreman ones) and “Content Only” changes that. If there are some issues with some rpm not being available through the foreman server it’s much easier if you know exactly where to look for it instead of searching…
So I have reset all my recently added repos which I have set on “Complete Mirroring” back to “Content Only” leaving me only the EPEL repos with Additive and the RedHat repos with Complete.
I think the GUI should suggest “Content Only” then by default when creating a new repositoriy? “Additive” as default seems not very useful…