Discovery KExec going forward

Hello,

discovery has an unique feature of kexecing into OS installer which can be leveraged on PXE/DHCP -less networks. Recently, we had lots of issues around kexeced kernel/drivers not working properly both in BIOS and UEFI. Most of these issues were around display drivers and while we were able to provide workarounds for most of the issues, after some chat with kernel developers from Red Hat, I think that long-term we need a kexec replacement. I don’t suggest dropping the feature, but having an alternative way for users.

My colleague Laszlo Ersek tells me that, to his understanding, kexec was created as a debugging tool for developers and for better troubleshooting: "In my personal opinion, kexec will never work reliably, period. Firmware doesn’t really matter. What matters is that you boot a new kernel while the hardware is in a dirty state, without devices having been re-set to their specified initial states – because an actual hardware platform reset was never performed. Some drivers can deal with this, some can’t.

To my understanding, kexec was conceived as a last-resort feature to help support teams. The main kernel is busted, you got an oops, there’s nothing “under” the kernel that could save the RAM contents for later analysis. So let’s launch another kernel one way or another, in a pre-allocated area; it need not work with full functionality, we just need it to dump the RAM contents to disk or network; done. However, you want to run a system installer, not a limited crash & burn memory dumper.

There is a way how to boot an OS installer in PXE/DHCP -less environment but the issue is the idea is little bit intrusive: Download kernel/initramddisk, destroy disk by writing a MBR/GPT, create a partition, put the files on it, write bootloader (Grub2) entries and reboot. The idea is simple - the hardware/VM will be reinstalled anyway, so it will be wiped out eventually.

There are non-intrusive options like detecting what’s currently installed and trying to build on top of that, but that will be too difficult and it won’t work for blank hardware/VMs anyway. If you have other ideas on how to approach this, let me know in the thread.

2 Likes