RFC: Drop -c flag from wget on TFTP proxy


#1

While we are working on improving naming conventions for TFTP boot files (https://github.com/theforeman/foreman/pull/5244) I have a complementary proposal. I hit this again when I was testing Atomic provisioning with 1.18 RC1 to verify one issue we have in kickstart repos.

Problems:

  • OS installer can end up downloading file which is being changed in parallel by wget which leads to corruption (various symptoms from Pane is dead to XFS module cant be loaded and kernel panic)
  • Flag -c corrupt files when installation media URL change or there is an update (kickstart repo “respin”)

Solution to both problems:

The proposed workflow of /tftp/fetch_boot_file proxy API endpoint with example parameters http://server/blah/7.5/vmlinuz and Redhat-7.5-vmlinuz:

  • Delete existing file/symlink: rm Redhat-7.5-vmlinuz to prevent race condition
  • Download into separate directory: mkdir Redhat-7.6-vmlinuz-orig; wget -N http://server/blah/7.5/vmlinuz Redhat-7.6-vmlinuz-orig/vmlinuz
  • Create a symlink after download is done: ln -s Redhat-7.6-vmlinuz-orig/vmlinuz Redhat-7.6-vmlinuz
  • PXELinux will continue trying over and over until file appears if the download is slower than the booting host

There are two changes worth noting. First, I propose to drop -c replacing it with -N. The former means “continue partially downloaded file” which is the issue, latter flag turns on “don’t download file with same timestamp” which is actually what we want from the beginning. Kernel/initramdisk files are relatively small and continue flag makes little sense compared to what it can break, however we should not be downloading them over and over again if there was no change and -N delivers here.

Second, flag -N cannot be used with -O (output file) so in order to achieve the correct behavior of wget, the proposal creates an extra directory for each file with -orig suffix, that is used as the working directory for download. I tried to find other tool like curl but it looks like all have similar behavior. Creating an extra directory for each TFTP file is not a big issue.

Final file is actually a relative symlink into the directory, we need to delete it prior redownload attempt so we are sure that PXELinux won’t try to pick the file up while it’s being redownloaded.

Important thing is nothing is changing in the API, everything works fine with our without changes which are currently being worked on and it will also work if we decide to refactor PXE files handling completely in core.

Finally, this will work out of box with current Foreman implementation as well with @Shimon_Shtein PR as well which adds ability to override TFTP filename conventions.


Foreman 1.18 test week
#2

Jeez I am hitting this so hard today during 1.18 RC1 testing because I switch between local mirror CentOS and remote mirror and initramdisk are different.


#3

When/how we decide whether we want to delete the file/symlink?


#4

No decision, just delete it. It was requested to download the file, so we make it unavailable until this is done. In reality when there was no timestamp change, this will be fraction of second when the symlink will not be available (we put it back). It’s “write one inode” operation, therefore this is fast.


#5

+1 to that, but I don’t understand how this would work out of the box, could you elaborate?


#6

What you mean out of box? Is there some particular concern you have?

Overall this is just switch from wget -c -O filename to cd work_dir; wget -N plus extra symlink stuff to prevent users accessing files being modified via TFTP. Symlinks do work fine over TFTP as long as they are relative in the chroot.


#7

I like the idea, so you get a +1 from me.
The naming scheme is completely unrelated, as you have already mentioned, and you are adding a “transactional” aspect to the action that is much needed here.


#8

For the record, this has been reported today: Bug #2412: Smartproxy TFTP using wget -c to pull new PXE defaults produces errors while booting - Foreman

I need to find time to work on this. Hopefully next month.


#9

No rush… The report is a few years old…


#10

And this bugs me for few years as well :slight_smile: