Idea: Dracut/Anaconda-based discovery image

lzap · February 5, 2021, 10:28am

Hello,

over past couple of days, I was looking into dracut, a generic initramdisk generator for Linux used by many distributions, and Anaconda, the Red Hat installer. It was for a specific customer problem, but while I was working on this, I got an idea.

Discovery could have been part of the Anaconda installer itself. We would not be building and maintaining our own image, it’s a lot of work I can tell you. The idea is simple - Anaconda is booted from network or an official DVD (with our own kickstart included) and in the %pre section it gathers few facts about the system, uploads them into Foreman and awaits instructions in a loop. Foreman could respond with either “here is your kickstart, continue” or - for different operating systems - with simply “all set, reboot”.

Anaconda includes pretty solid Linux environment, there’s full bash and python available. I think this could be a good time to maybe get rid of puppet during the discovery process, we really don’t need that amount of facts - all that matters is network configuration and this can be easily parsed and sent via our own format. Or we could simulate puppet facter via uFacter or even install the original facter at cost of increased size if we come to conclusion that’s the right thing - Anaconda image can be overlayed with product.img - that’s the file Anaconda always attempts to download from the tree (and it ends up with 404).

Here is a pseudo-example which “downloads” a kickstart from HTTP and includes it. I had to encode the kickstart content itself because it breaks parsing, normally we would use curl or something to get it.

%pre
# 1. gather some facts about the system via shell or python
# 2. upload it to the discovery endpoint via curl
# 3. wait until the endpoint responds with a HTTP code
# 4. the response can simply tell to "reboot" or
# 4b it can carry rendered kickstart itself which can be included:
#    (I had to encode it because it contains %pre which breaks the syntax)
base32 -d - >/tmp/discovered.ks <<'EOF'
ORSXQ5AKNZSXI53POJVSALJNMRSXM2LDMU6WK3TQGFZTAIBNFVQWG5DJOZQXIZJAFUWWE33PORYH
E33UN46WI2DDOAQC2LLON5UXA5RWEAWS22DPON2G4YLNMUQGW4ZNORSXG5AKMZUXEZLXMFWGYIBN
FVSW4YLCNRSWICTVOJWCALJNOVZGYPJCNB2HI4B2F4XTCOJSFYYTMOBOGEZDELRRF5RWK3TUN5ZS
6OBNON2HEZLBNUXUEYLTMVHVGL3YHA3F6NRUF5XXGLZCBJZG633UOB3SALJNOBWGC2LOORSXQ5BA
OJSWI2DBOQFGWZLZMJXWC4TEEAWS26DMMF4W65LUOM6XK4ZAFUWXMY3LMV4W2YLQHV2XGCTMMFXG
OIDFNZPVKUZOKVKEMLJYBJZWK3DJNZ2XQIBNFVSW4ZTPOJRWS3THBJWG6Z3HNFXGOIBNFVWGK5TF
NQ6WS3TGN4FHI2LNMV5G63TFEBKVGL2FMFZXIZLSNYFHEZLCN5XXICTCN5XXI3DPMFSGK4RAFUWW
Y33DMF2GS33OHVWWE4QKMNWGKYLSOBQXE5BAFUWWC3DMEAWS22LONF2GYYLCMVWAU4TFOFYGC4TU
BJYGC4TUEAXSALJNMZZXI6LQMU6SEZLYOQ2CEIBNFVZWS6TFHU2DAMBQBISXAYLDNNQWOZLTBJAG
G33SMUFCKZLOMQFA====
EOF
%end

# 5. once %pre block is finished, the kickstart is included
# 6. and anaconda continues with provisioning
%include /tmp/discovered.ks

Another, more flexible possibility, is to install our own software into product.img overlay, the discovery services would start before Anaconda (during dracut) and decide if to continue or restart. PXE-less wofklow could be possible as long as kexec is available in the environment (Anaconda wants to remove it but we can add it back in the product.img).

Building on top of Anaconda has some pros and cons. First, all hardware certified for RHEL/CentOS will simply work. If there is a problem with drivers, we can reach out to Red Hat teams to fix this. Second, Red Hat users don’t need to actually reboot to start provisioning - this would work even in PXE-less environment. On the other hand, Anaconda initramdisk is 45MB and install.img is about 600MB, that’s a slight increase for non-Red Hat users who would be rebooting into the final OS installer. Also, it’s a bit tricky to get to interactive stuff in %pre, actually I haven’t found a way.

Therefore, I am thinking about a second solution, to build dracut module that would do the discovery process. This is slightly more challenging for implementation, dracut environment is more limited and complex (ton of shell + systemd) but users would be simply able to generate discovery image with dracut as long as we provided a dracut module (named base-foreman in the example):

dracut discovery-$(uname -r).img -m "base-foreman network terminfo debug udev-rules ssh-server"

There is even a dracut ssh server module and Anaconda also supports ssh server. Currently, Foreman Discovery is implemented via Request in, Request back and I think we should either leverage Remote Execution or at least change this model to polling.

This would work an most linux distributions, resulting image would be around 20-70MB depending on amount of modules and drivers included. Dracut is extremely flexible and fast to generate, it’s possible to include own scripts, modify behavior with hooks - we would be giving really powerful tool to foreman users so they can experiment, easily add their own facts, scripts, drivers, memory tests or install any kind of software that is available for their OS and call it via SSH if we choose to go this way.

Dracut option would give us great PXE-less possibilities - we could actually “merge” PXE-less discovery and bootdisk workflows into one. There would be only one (small) image that can be booted from network or from CD/DVD. Dracut have many options and bintools can be replaced with busybox and most unwanted firmware can be easily removed to bring the image size down to few megabytes if needed but I think we would distribute the standard sized 40MB image for best possible hardware support out of box. The new discovery could be actually generated per host, per subnet or as a generic image effectively replacing bootdisk.

Another big advantage for both approaches (Anaconda uses Dracut so it’s the same) is well tested and documented networking possibilities (those dhcp or ip= options) which includes bonds, bridges etc. More info in dracut.cmdline (and other) man pages - dracut is well documented.

And who knows, maybe kexec could be more reliable also for PXE-less workflows since it’s early stage of boot process and not all drivers were loaded yet. In any way, we would be ditching most of the discovery work, including the TUI and we’d need to build something more simple, based on shell. I’d like to rather simplify things (maybe no facter + just ssh no REST API) than trying to be backward compatible.

If there was a way to write a module that would not require to replace “base” dracut module and that would hook into dracut in the correct way, we could even build Anaconda (or any OS installation) PXE/DVD images which are based on dracut and this would work the generic way covering both PXE and non-PXE environments (assuming there is a way to override kernel command line arguments for the installer):

boot dracut with foreman module
foreman code would pause the booting
it would perform discovery
foreman would trigger provisioning via reponse or ssh
OS installer (Debian, Red Hat) would start installing (without restart)

I know it’s a lot, if you did read it through please give me a like so I know there’s anyone reading those. And drop your thoughts. If you have some experience with writing dracut modules, l am interested. Or maybe there’s someone already doing this, I’d not be surprised.

Dirk · February 5, 2021, 4:16pm

Just to add another option which came in my mind: Anaconda has plugin capabilities, not sure if this could be used here, but perhaps this is an option which combines strength from Anaconda and flexibility. I have no deeper knowledge but stumbled over it while working with OpenSCAP in the past.

lzap · February 8, 2021, 10:01am

Interesting.

I slightly lean towards just the dracut path. It’s more flexible.

Over the weekend, I was able to do a small “99foreman” dracut module that simply drops into shell, but network is not up for some reason although I put the “network” module. The image has network-manager but no systemd. I need to dig deeper.

I was able to run “uFacter” just fine, it returned everything except networking facts (since there was only localhost).

One disadvantage of this apparoach is that not much is initialized at this point. The kernel should have full network, should see memory, CPU and BIOS/EFI information and I think it should see most of regular storage devices. That should be just enough to put a node into inventory.

Any other hardware drivers can be easily put into the image via dracut command line options.

viwon · February 10, 2021, 1:20am

Let me through out an idea I have actually implemented in the past.

I used to use anaconda/kickstart to do very much a similar build process that discovery does.

You can do this easily because anaconda lets you just have an empty kickstart file that is nothing but an %include which can be created later, and a %pre which runs before the %include and which can then provide the %included file.

What I would do is I would boot up anaconda with a kickstart file that just had a %prescript, and it would contact my build server via a menu selection on the iso to chose where to download a “pre-ks” from, and the kickstart would have nothing but an %include for a future file TBD, and %pre section that would gather all sorts of data about the system such as the model, disks, network etc and then upload it to the build server via a POST, and then it go into a loop waiting to download the rest of the kickstart.

The loop would basically periodically try to curl, and if the result it got back was said to wait, it would keep waiting.

And eventually when it did get the rest of the kickstart file, it would drop it into temp as the same name as was an %include and the build would then go.

It really did almost work like discovery and foreman, and it was pretty simple to do with a few simple webpages for people to manage host setups, and a cgi to dynamically provide a kickstart file based on the host data entered. I used this for almost 15 years before moving to Foreman.

In my experience for Anacaona/kickstart based systems it worked great, but of course Anaconda can’t be used for other distros.

I have used other tools in the past that their boot CD actually would download and create a small install boot disk on the final boot device, and then reboot from that, load the install into ram and delete its self on the drive and do the final install. That always seemed a bit messy to me.

viwon · February 10, 2021, 1:32am

Just realizing I have done pretty much what you suggested in the first post I would say it would work for anything kickstart based.

But what about non-kickstart based Distros? I did not know they could be installed by anaconda.

lzap · February 10, 2021, 10:48am

Thanks for sharing your thoughts and experience.

Yeah, we never want to leave other distros behind, this is the reason why I think that building our own dracut would be actually the way to go. I am thinking that if we included kexec into the image, since not many drivers are loaded yet, it could be actually a good experience to kexec into the OS installer without rebooting.

viwon · February 10, 2021, 2:15pm

Lukas, I thought you had an idea how to do the discovery process without kexec.

Everyone seems to say kexec is not reliable, but I have never personally run into issues with it other than it is a bit of ram hog and can’t be used for systems with less than 2GB ram.

I seem to remember you talking about maybe using remote execution?

lzap · February 11, 2021, 8:24am

Yes, however LiveCD-based discovery have more modules loaded, e.g. graphical drivers which were mostly having issues. My thought is that if we use dracut image with less modules kexec might be reliable.

We know that for some kexec works like charm but we are getting those calls for some specific systems and it’s tough to troubleshoot. This is more about supportability question than technical one. We will keep the kexec option there even if we decide not to support it.

Yes, that’s still on the table. I think replacing the REST API (reboot and kexec commands) with SSH would open doors for more flexibility.

lzap · February 11, 2021, 8:43am

For the record, this is how RHEL/CentOS dracut image is created:

dracut --nomdadmconf --nolvmconf --xz --install /.buildstamp --no-early-microcode \
--add fips --add anaconda pollcdrom qemu qemu-net prefixdevname-tools \
--force boot/initramfs-4.18.0-284.el8.x86_64.img 4.18.0-284.el8.x86_64

It uses the dracut-config-generic package to configure dracut.

viwon · February 12, 2021, 7:13pm

Ok, thinking this through a tiny bit more. If kexec was replaced with using remote exec, what exactly would discovery do in order to reboot into a new kernel/os?

I can see replacing the smartproxy components on the discovery image with remote exec, given that there are only a couple of functions the discovery image needs anyway like facts, and power etc.

But some method has to be come up with to reboot into a new kernel without kexec.

Maybe something like:
a) creating a new boot partition on the local disk
b) install a mini-os with new os kernel / initrd
c) rebooting,
d) discovery ISO passes through to the harddrive boot after a few seconds of (hit key to discover).
e) mini-os install image on local hardrive boots up into a ramdisk, and kicks off OS install process

I don’t know if you had gotten that far in the thinking or not, maybe I am getting ahead of things here.

lzap · February 16, 2021, 12:26pm

That’s exactly what we should implement to replace kexec. This does not need to be a “mini OS”, all we need is one small partition readable by grub2 (almost any) with anaconda kernel and initramdisk. On EFI systems this can be just a configuration change and two files dropped, on BIOS we need to create a partition.

lzap · January 12, 2022, 9:14am

So after fiddling around dracut and anaconda last year, I kind of lean back to my initial idea which was confirmed to be valid by @viwon - to do the whole discovery and provisioning start process as a single (python) script. This has one big advantage - if this works fine for Anaconda, it can be also implemented on a LiveCD - any kind of livecd will work (we will continue using CentOS Stream), the only requirement would be just Python.

This would mean that we would need to change the discovery API a bit - it would be pull based instead of push based API call. But that is quite an easy change - the discovered host currently only returns the host record. Instead, it would return a JSON of some kind of state (discovered, managed), rendered templates (particularly we are interested in the kickstart template) and the host record for backward compatibility.

viwon · January 13, 2022, 4:27pm

That might make it possible to kickoff other installers as well in addition to anaconda for other distro types or OS types even.

lzap · January 14, 2022, 9:13am

Exactly my thinking, if the script is simple enough might be even possible to do the same in other OS installers, I guess Anaconda might not be the only one to support some kind of pre-install scripts. I really like the idea, I want to SIMPLIFY discovery badly