RFC: Change TFTP file naming pattern

Hello,

we have a long-term issues around TFTP file deployment. Files get overwritten in some cases, corrupted on upstream changes and when Katello is active, things can break when Content View / Source / OS Variant is changed (again, can lead to corruption of data). The way our users find out that initramdisk is corrupted is “the hard way” - OS installer randomly crashes, freezes or can’t detect kernel module (e.g. XFS file system module not found or similar).

When new host is created, TFTP file name is determined using simple pattern OS name, version and architecture (e.g. RedHat-7.1-x86_64-initrd.img) and source URL is detected according to the OS variant (Debian, Ubuntu, RedHat…) and this is passed to Proxy which handles the download using wget.

Code-wise, it’s the Operating system object responsible for generating the filenames and source URL. There is a room for improvement, it should only be responsible for determining the source URL since installation media, major, minor and OS variant are needed for this, but the filename pattern should be probably moved away from OS to Host.

There is an attempt to fix this by changing naming pattern to hash sum generated from source URL. The idea is - when the source URL changes, the file name also changes. It was based on my idea, but I think it’s not good approach anymore, sorry about that. This stems from the fact that it was actually second iteration of my initial idea, which I want to propose right now. Sometimes the initial one is better.

https://github.com/theforeman/foreman/pull/5244

The problem is when Katello is enabled, then source URL of kickstart can change rapidly as user publishes/promotes the content. This will end up in a messy TFTP folder full of randomly named files and users won’t be able to tell which are in use and can be deleted as there is no connection between files and hosts - the OS name is actually misleading as it does not represent content in Katello but only template associations. For this reason, I’d like to propose a reasonable solution to the problem both short and long term.

The proposal

Foreman part:

  • Change the TFTP files naming pattern to hostname (e.g. lukas-zapletal.domain.lan-vmlinuz).
  • Orchestrate file removal on host deletion.
  • (Optional) Move naming pattern from OS to different model.

Smart proxy part:

  • Download file from source URL into temporary file.
  • Calculate SHA sum from file contents and rename file accordingly (e.g. XXXXXXXX-vmlinuz).
  • Create target file symlink (e.g. lukas-zapletal.domain.lan-vmlinuz → XXXXXXX-vmlinuz).
  • Delete only symlink on TFTP file removal request so regular files are kept forever.

Key concepts:

  • Keeps the TFTP Proxy API and contract unchanged.
  • Keep TFTP file download on host creation orchestration (there were ideas to move this to OS or Installation Media creation but I think this would generate more problems than we actually have today).
  • Solves all corruption problems since files are always named in unique way.
  • Solves issues when kickstart files are regenerated upstream (new file is created).
  • Keeps TFTP directory clean - symlinks get eventually deleted on build exit orchestration.
  • No duplicate files - we have many dupes these days on PXE-busy environments.
  • Naming pattern does not depend on OS name, version and architecture anymore.
  • Katello friendly - promotions, 7Server upgrades, OS variant changes - all solved.
  • No radical changes in current workflow.

Drawbacks:

  • Will lead to more and more regular files with SHA names in TFTP directory (but pace will be quite slow - only new OS versions so dozens of files, time/date of creation tells you when this happened).
  • Possible SHA content conflict - not a real problem I think really.

At first sight, this might look like a “hack” and something that Proxy is trying to “fix” for Foreman core, but the problem is that it’s Smart Proxy what downloads the kernel/initramdisk - Foreman has no clue about content. As I showed, source URL is not good source to calculate hash, so Smart Proxy needs to do this - unless we want Foreman to download the boot files itself only to calculate the sha sums. That’s also an option although I prefer the Smart Proxy side.

I like it in general. Two concerns - will 10k symlinks in single directory work? And if host deletion orchestration fails for whatever reason, can we detect, there’s old symlink and replace it on recreation? It wouldn’t make the situation worse than it is today, but perhaps something that this helps solving.

Does that mean a file would be downloaded every time we create a host?

I believe that most of hosts are not in build mode, but if you have 10k hosts and you put them into build mode that would mean a lot of symlinks, yes. I hope this is an edge case.

Good point, we actually can now since it’s per-host symlink, we can create a cleaner thread/cron job to clean all symlinks older than X days. Users could opt-out from this easily.

I am assuming to keep the current behavior which is - check “last modified” flag and if the source document was unchanged, don’t download. If it’s different, download but to different (temp) filename and then rename according to content hash.

This still doesn’t solve all problems with current tftp/media/os architecture. In particular, file download is part of host crud orchestration, which is one of the main causes for a bunch of problems we are seeing.

Why not consider a different design altogether, perhaps one where responsibility for managing os’, media, and files is delegated to one service? Possibly running on smart-proxy?

I was hoping to get some reasonable solution which does not break the Smart Proxy contract and still provides a good results. We all agree that this area needs a major overhaul but I am afraid we are talking and walking around for years. For this reason I was trying to push a reasonable short-term solution.

But since we passed 1.18 branching point, feel free to work on new ideas and designs.