Rearchitecting koji server

pcreech · February 17, 2020, 1:22pm

After working with Koji for both adding a dnf supported builder, and again recently with the upgrades to support el8 builds, I came to the conclusion that there are things we can do to re-architect our koji infrastructure to make it more flexible and easier to manage. Below are some of the actions I’d like to perform in the coming months (I hope to have this completed before 2.1 branching if accepted)

Downsizing raw hardware
- We currently do not use most of what is available to us
  - can fit a more flexible architecture into what we are using currently
Hub + Separate builders model
- Separate builders from the central hub
- Have smaller builders that we can better utilize
- improved scale-out ability, support different builder types
Upgrading the underlying storage type, for better performance.
- /mnt/koji currently backed by spining disk infrastructure, not ssds
Utilize authoritative mirror urls for external repos instead of storing them locally in their entirety
- inbound data into amazon environment is free
- We have done this with el8 client builders, as a proof of concept
- reduce storage requirements
- no need to ensure syncing is working\
Ansiblize koji infrastructure management
- Existing modules that work with managing a koji environment

The plan will be better formalized at a later date, but the current idea is to create a new clean koji environment, and then transfer our data over in a “backup/restore” type situation. Once the process is finalized and successful, we’ll pick a date to run it and switch the public koji elastic IP over to the new environment

ekohl · February 17, 2020, 1:32pm

Recently AWS started to host internal mirrors for CentOS which makes this even less of an issue.
https://lists.centos.org/pipermail/centos-devel/2020-February/036529.html

pcreech · February 17, 2020, 1:39pm

Correct, but they are not publishing the internal direct mirror url, the way it works is they are intercepting mirrorlist queries that are coming from aws and respond with the aws internal mirror. Theoretically, this url can change.

We are thinking of ways to still utilize this url

lzap · February 19, 2020, 1:08pm

Big warning right there, the sizing was done for a reason. Remote builders need to mount several volumes via NFS and for some reason on AWS this was slow as hell. The architecture was simplified so we only have a single note, builds are blazing fast from then.

But you don’t need SSD for this volume if I am not mistaken, it contains mostly build artifacts, external and our repository mirrors. Building is done on /mnt/koji/tmp or something like that, you only need that working directory to be on SSD.

Also an architecture decision when I was setting it up - I accidental terminated the instance during “cleanup” of the AWS account and local SSDs are always ephemeral storage. If an instance is terminated, data is gone. I think locally mounted SSDs are available for that instance and mounted as swap + temporary directory where the builds are being performed.

If you mean storing repositories on remote SSD via AWS block storage, that could possibly speed up repocreation, yeah.

Good stuff, this instance was created after my misclick in a hurry, lots of things are probably very wrong. There are symlinks everywhere as we were learning how Koji works and as we grew and content was getting out of bounds.

I have a request - if it would be possible to get FDI building in koji, we could get rid of the hack we use for upstream - we currently spawn a VM an build the FDI there via Jenkins. If we could build the FDI like in downstream using “koji livecd” command that would be great. I have tried to set it up once but this needs some low-level koji know how as building livecds is little bit different workflow than simple packages.

pcreech · February 20, 2020, 5:36pm

Our dnf based builder currently runs this way, and we haven’t had any major performance impacts so far. Coupled with an increased ability to parallelize builds, I think we will be net-positive on most builds. If we determine there is a significant enough impact to warrant alternative options, we can utilize hub policies to ensure those builds get the performance they need

From a performance standpoint, that is yet to be seen from an analysis. But the benefits we get by upgrading the storage type from what it is currently is at least conceptually perceived worth it. Current storage type has a hard limit of 1tb in size. Newer storage types don’t have this limitation.

Also, by utilizing ebs ssd storage instead of local attached ephemerial storage, we will solve the data persistence issue and not have to rely on them. We can consolidate need for ssd space across machines, as well as reduce our tco by being able to utilize smaller instances. (and thus, potentially have more of them). We can then also utilize aws ebs lifecycle management to ensure proper backups.

Yes, the i3.xlarge instance type has a significant ssd storage attached via nvme bus that comes with the instance. Upon further research, most m5 machines and newer can also attach ebs ssd storage on the nvme bus as well, and even if not, it’s still ssd utilizable, and we are reducing our instance cost in favor of ebs storage. Looking over the performance history, we are nowhere near utilizing the amount and performance provided by the local nvme.

We have been utilizing an alternative here as well. As far as external repos, we are looking at using master mirror web links instead. All inbound traffic to aws is free, and this will help reduce our overall storage costs.

This can definitely be evaluated.