Koji disk space cleanup

As we need to clean up some disk space on Koji, I am proposing the following changes:

  1. Remove all locally synced Fedora repositories, and remove Fedora 27 and 28 from the external repos list in Koji. Convert Fedora 29 (only built for nightly) to point at external URL.
  2. When Katello 4.0 is released, drop all locally synced EL 5 content.
  3. Remove all locally synced EPEL 6 content and update external repos to point at Index of /pub/archive/epel
  4. Convert puppetlabs external repositories to point at official Puppetlabs repositories and delete local content
  5. Remove foreman-rails* external repositories
  6. Remove tfm-ror51* external repository
  7. Remove rhel7-x86_64 as we use centos7

There is likely more conversion we can do but this my proposed starting list. @evgeni @pcreech please review

Can we make 2 “When Katello 4.0 is released, drop all locally synced EL5 content”?
We need EL5 only for client-bits, and of those Katello is a heavy user, so I’d prefer not to block Katello 3.17 fixes while it is a supported target. (Quite possible this will never be needed, but if we wait, we shall wait till the correct moment).

Would replacing locally synced content with external repos mean we are saving disk space at the cost of slower builds and higher bandwidth fees (iirc koji is on aws infra)? do we have any data on how much we use these repos and how much it would cost us to do the switch? in other words, could buying bigger storage be cheaper than increasing network costs, at least for some more commonly used repos?

I believe since the connection is initiated from inside AWS reaching out there is no network cost to pay and part why @pcreech wanted to move this route. We have been using this strategy for all EL8 builds and have not noticed any issues (knock on wood). For at least Fedora and CentOS repositories, their Koji takes the same strategy (that is where we got the idea) so in theory if we are pointing to the same repositories they are pointing to we can assume the same level of stability.

We also use mrepo today which does the job but is a bit old. Perhaps if we need a cached option we could look at a tool that manages and syncs repositories. And fetches and caches it on demand.

1 Like

We did see one issue with external repos: when the repo changes, but Koji didn’t regen the local metadata yet, it might end up trying to pull a package from the external one that isn’t there anymore (as those are replaced in CentOS, not added). But that can be easily fixed by manually regenerating the repo (and probably telling koji somewhere to do that itself more often). So not a big deal as such.

I have made an initial pass tackling #1 (minus the switch to external repositories). Update to list of external repositories – Cleanup removed external Koji repositories by ehelms · Pull Request #1581 · theforeman/foreman-infra · GitHub

We are now down to 701GB used on /mnt/koji. The external repos, /mnt/koji/external-repos/, are now down to 220GB total.

Poking around, we additionally have the following taking up space:

  • 16GB dating back to 2017 – /mnt/koji/backups/postgres
  • 32GB from duplicity – /mnt/koji/backups/ephemeral

Looks like there’s a kojira.conf setting we can enable for this:


koji doesn’t monitor external repositories for changes by default. Administrators can enable such bejaviour with setting check_external_repos = true in kojira.conf (for details see Koji Utilities).

As Eric mentioned, all inbound data transfer into AWS is free. Completely, totally free. It’s traffic that leaves the aws region that you pay for.

| — | — |
|Data Transfer IN To Amazon EC2 From Internet||
|All data transfer in|$0.00 per GB|

1 Like

Will that regen all repos/tags that use the external repo? As I think all our old tags (like 1.x etc) also refer to the now external repos and regenerating those would be a waste of resources.

Let me elevate the question: should we prune old tags off of Koji up to a release point?