RFC: Distribution of netboot files via OCI registry

Hello there,

it’s been a while since I started a thread, but here am I, cheers! Recently, I was playing around with ORAS tool and an idea came to my mind.

Problem: in order to prepare PXE environment on TFTP/HTTP smart proxy, Foreman downloads installer image (kernel + init RAM disk) from linux distribution repository when a host is created. Then a bootloader, which is installed by foreman-installer, is used to load installer into memory. There are couple of problems with this approach:

  • If the host is fast enough to boot and PXE files are still downloading, boot can randomly freeze. Symtoms are usually hard to undrstand, from “file not found” to kernel freezing at various stages depending on how much is downloaded.
  • When PXE files in the upstream repo changes (e.g. it is a alfa/beta version “compose” for example Fedora Rawhide) this amplifies the problem above as files are being redownloaded over and over again.
  • Some distributions do not have possibility to directly download installer images, for example Ubuntu few years ago introduced the autoinstall/cloud-init method and users must manually download the netboot files onto TFTP/HTTP proxy.
  • In order to reliably load installer on SecureBoot enabled systems, shim.efi and grubxxx.efi must match the netboot files. They must be from the same OS repository, otherwise signatures will not match and boot will fail.
  • TFTP/HTTP netboot directory grow over time which is a problem specifically to Katello deployments with many promotions.
  • Downloading files directly is not available for image-based deployments (e.g. Anaconda liveimg image) or bootable containers (bootc).

Proposal: Start distributing boot files on our own instead of relying on Foreman automatically downloading them. This is a big change in the workflow and it moves the responsibility from Foreman to Foreman developers in order to maintain such source. Alternatively, discuss the possibility of copying those files aside with upstream distributions (Fedora, Debian, Ubuntu).

Implementation: During installation, foreman-installer would perform series of commands which would synchronize netboot files into TFTP/HTTP smart proxy. More details about how this would look like down below. Alongside of the actual files, metadata would help Foreman to understand what is available and users would be able to associate netboot files with Operating Systems in Foreman (or this could be done automatically). Rest of the workflow is the same, except, it would be more reliable and faster. TFTP/HTTP smart proxy would only install hardlinks/symlinks instead of copying real files which would be instant.

Concerns: First of all, this will require us (Foreman developers) to maintain and update these files. Everytime new stable version of supported distribution comes out, we need to upload and sign those files and publish them. In order to make this smooth, it needs to be automated as much as possible (except signing which will always manual confirmation) and it needs to be super easy for community to suggest updates.

As I stated above, with coordination from upstream developers (e.g. Fedora), this process could be automated during OS build. Therefore Foreman devs would only need to maintain binaries for those distributions which will not do this for us.

Secondly, there might be legal issues - we would only accept binaries from open source distributions which are well known and legally safe. Most of distros are clean and those netboot files typically contains code from grub, shim and linux kernel (all GNU GPL) so we should be fine.

Technical information:

I created a prototype which utilises ORAS CLI tool and library for publishing content via OCI (Docker) repositories. In an essence, this tool allows you to upload or download arbitrary files into container registries, add arbitrary metadata, sync and sign them.

What would foreman installer (or smart-proxy) do is simple:

oras pull ghcr.io/lzap/bootc-netboot-example:rhel-9.3.0-x86_64

This command will download netboot files for RHEL 9.3, it includes shim, grub, linux and anaconda init ram disk and stage2 image:

vmlinuz
initrd.img
install.img
shim.efi
grubx64.efi

Before code is executed, installer or smart proxy must validate signature (public key would be distributed with Foreman itself or upstream OS in case it is being built by them):

cosign verify --key cosign.pub ghcr.io/lzap/bootc-netboot-example:rhel-9.3.0-x86_64

Only Foreman Core team members would be eligible for signing artifacts, this will ensure no one can tamper with these binaries.

A single registry can contain multiple files or whole directories, downloads can be either selective (one OS version, all versions of particular OS) or full. A typical OS netboot artifact size is around 1GB (for Red Hat or compatible systems) so the repository would probably grow to several gigabytes and require regular cleanup and garbage collection.

Then, during provisioning, TFTP/HTTP smart-proxy would just create bootable files per operating system (or MAC address - this is being discussed in a different thread) as hardlinks or symlinks instead. This will be fast, more reliable, space efficient and safe.

Why containers:

These files could be distributes as plain files or tar via HTTP, so why bothering with container repositories? Fedora community is currently experimenting with bootable containers (bootc in short), these are OCI/Docker compatible containers which also contain linux kernel and drivers and can run in a VM or bare-metal. Problem is, these images are only distributed via container-native channels, not a RPM repo or a HTTP directory. I am currently looking into how Foreman could support bootable containers, that is why ORAS tool and this type of distribution came to my mind. In fact, I was made aware that this is an option, credits below.

Publishing process:

Is actually extremely easy, all you need is credentials with push permissions and private key to sign artifacts. The details can be found in the registry itself, I am using github container registry but for production deployment we would probably need to use a different registry because there are size/traffic limitations for open-source tier on Github.com:

pushd rhel-9.3.0-x86_64

oras push ghcr.io/lzap/bootc-netboot-example:rhel-9.3.0-x86_64 \
    --annotation-file ../annotations.json \
    --config ../empty.json:application/vnd.oci.empty.v1+json \
    --artifact-type $ARTIFACT_TYPE \
    vmlinuz:$MEDIA_TYPE \
    initrd.img:$MEDIA_TYPE \
    install.img:$MEDIA_TYPE \
    shim.efi:$MEDIA_TYPE \
    grubaa64.efi:$MEDIA_TYPE

popd

Signing is super easy:

cosign sign --key cosign.key -y ghcr.io/lzap/bootc-netboot-example:rhel-9.3.0-x86_64

Updating files:

Update is not different from initial sync - as new tags are introduced to the container registry, those will appear alongside with metadata providing full OS names, versions, architecture and other info. In other words, clicking a “synchronize netboot files” in Foreman UI/CLI will do the trick.

Support for “other” OSes:

There should be a way to easily upload these artifacts for “other” OSes which are not open source or not uploaded for any reason. Easiest way is to allow defining a “secondary” registry where users could upload their own netboot files.

In fact, Foreman must support any number of registries. My hopes are that this will get adopted by Fedora / CentOS / Debian communities and their release engineering teams will push into their own official registries. Therefore there will be multiple sources of these files.

This also means there must be a good mechanism for publishing the public key. Must be part of the content the system is being installed from in a form that is easy to download:

  • packages repository
  • OS image
  • bootable container

Support for rolling releases:

One problem I am unable to crack are rolling release distros (Fedora Rawhide, CentOS Stream, Arch Linux). These OSes typically refresh netboot files every day/week/month and this would mean the registry would grow quite a bit (100MB every day is not feasable). Couple of options:

  1. Ignore those distros. Users would need to manually put those files into the “secondary” registry themselves.
  2. Create “rolling” registry where artifacts are being uploaded every week automatically via CI/CD, this process would be fully automated and binaries unsigned. Old tags would be deleted after a month and old commits removed from the registry (garbage collection).
  3. Other solution?

The ultimate goal is to remove how Foreman downloads PXE files today, so this is an important question to consider.

Katello:

Pulp already supports synchronizing container repositories and we already tested this, therefore, netboot files could be synchronized via Pulp too with all the goodies including distributing them to remote smart proxies. Pulp itself could theoretically host those files for HTTP boots, however, PXE is still very popular and it needs TFTP protocol which Pulp does not support. Therefore I suggest to keep using the current TFTP/HTTP smart proxy and only use Pulp as a cache.

Conclusion:

This change is a big one, but it goes hand in hand with the idea of improving SecureBoot workflow, it solves several problems at once making overall experience more smoother specifically for Ubuntu users as well as Red Hat users who would like to use Secure Boot. On the other hand, it is an extra burden we would need to carry going forward as Foreman developers.

Credits:

  • Josh Boyer (the idea)
  • Colin Walters (the idea)
  • Ina Panova (prototype)
  • Andrew Bock (help)
  • You! (Try this out, make a comment)

Errata (edit):

  • Initially, stage 2 image (install.img) was not included, however, I believe it is is convenient to have it despite its size for RH systems (around 900MB). Otherwise, original kickstart os repository tree would be still needed.
  • Foreman must support arbitrary number of registries not just one or two.
  • Few typos and updates, unfinished sentence.
6 Likes

I see the convenience, but I think ultimately it would be better to:

  • Try to standardize the format of “PXE media as OCI artifact”
  • Have the operating system/distribution own the delivery and signing of this
  • Ensure that tooling used to build it is also shipped by the OS/distro (e.g. in theory perhaps Image Builder/bootc-image-builder would learn how to make these artifacts)
1 Like

I would say this is the ultimate goal, but Foreman supports huge variety of systems including Windows or networking gear and I do not expect those vendors to ever bother implementing this. What you say makes sense for bootable containers context, however, Foreman main provisioning goal is to deploy systems the traditional way: packages, OS images or whatever the OS supports.

So there will always be a Foreman community maintained “3rd party OS” registry I suppose. But you make a great point, Foreman must support any number of “other” registries to cover both official, unofficial and Foreman one.

This also highlights one thing I am missing from my proposal - public key must be published alongside the official repository so PXE files can be verified. In case of RPM os installation tree it can be placed in the “images/” directory.

1 Like

I like the idea to bundle all files needed for successful PXE boot (independent of the effort to main these).

Couldn’t this also happen with OCI artifacts? Where it definitely helps is to avoid the scenario where kernel/initrd filea contain actual HTML telling you 404 - no kernel/initrd files found.

Yeah, for orcharhino we currently create a file repository containing the actual Ubuntu ISO content in order to retrieve kernel/initrd. I wouldn’t be sad if that was dropped.

As you wrote, for our SecureBoot proposal this would be a nice completion. And yes, there must definitely be a quite easy way to provide custom OCI artifacts anyway.

JFYI: Artifacts must be placed inside TFTP root directory in order to work with relative symlinks due to chroot constraints.

Well, my idea is that we would introduce a new resource in Foreman: netboot files. Users would need to synchronize them and associate these with OS resource first. So the synchronization would be moved a little bit earlier in the process.

But it is worth mentioning that the downloading bug can be solved by performing the download asynchronously during “build confirmation” dialog. It would be prerequisite for a host in order to enter the build.

Oh yeah, how many bugs and cases I solved by just resolving symlinks :slight_smile:

That HTML issue is something @magnus pointed me to last week and I wrote patch: Fixes #37147 - Pass --fail option to curl by ekohl · Pull Request #885 · theforeman/smart-proxy · GitHub.

While looking at that code I also noticed that there’s no wait mechanism implemented.

This is the Smart Proxy API endpoint:

Then you can see the actual implementation here:

The relevant part is the HttpDownload class. That’s a task (implemented as a thread) that’s started without a wait.

I’ve opened a PR (Wait for TFTP downloads by ekohl · Pull Request #886 · theforeman/smart-proxy · GitHub) to enhance it a bit, but it needs further work to make it reliable.

2 Likes