RFC - Systemd first boot service for host provisioning

Hi,
Right now we are having an issue with provisioning RHEL 9 machines, where provisioned hosts go into emergency mode after initial reboot.

The problem is in running package upgrade in the %post section in the Kickstart template. Luckily we have temporary workaround for it - disable package upgrade and do it after host is rebooted and provisioned.

This leads us to this RFC, where I would like to propose to add a new one shot Systemd service foreman_first_boot that would run on first machine start and execute a template defined by user.

Service template could contain package upgrades and some other stuff that we do right now in the %post section.

SystemD adoption (source)

OS Version Year
CentOS 7.0 2014
RHEL 7.0 2014
Debian 8.0 2015
Ubuntu 15.04 2015

Service example

[Unit]
Description=foreman_first_boot

[Service]
Type=oneshot
ExecStart=/etc/foreman/foreman_first_boot.sh
RemainAfterExit=false
Wants=basic.target
After=basic.target network-online.target nss-lookup.target
ConditionPathExists=!/etc/foreman/foreman_first_boot_done

My first impression is that weā€™re building more hacks. Instead Iā€™d prefer a completely different approach.

I think we should avoid using %post as much as possible. It was always a hack and we can blame RHEL subscriptions for most of it.

The backstory is that when you provision RHEL you need a subscription. However, there was no way to get that subscription enabled in the regular install. So thereā€™s a hack in %post. Then you can install additional packages instead of using %packages. This was never needed for Fedora and CentOS, but for compatibility we used a single way.

The good news is that the support has landed in Anaconda:

So things will improve a lot if we start using those macros when itā€™s available.

I think this is the route to take. Once you have that, you donā€™t need upgrade at all: it simply fetches the latest version on installation.

See Performing an advanced RHEL 8 installation Red Hat Enterprise Linux 8 | Red Hat Customer Portal as well.

Also, should this be in the RFC section?

1 Like

I took a stab at writing the very initial example:

Itā€™s far from useful now and nowhere near complete, but I think it shows a much more elegant way. This has been on my radar for the past 3 years, but I never got around to it.

@ekohl are there some other things we do in %post that perhaps donā€™t have a native anaconda support? What I like about the proposal here is, that the call home curl happens only after the host reboot. So in the Foreman, host is considered built only when itā€™s really successfully provisioned.

rhsm command looks promising, that would actually solved the issue. My question is, that we should somehow check if installed Anaconda actually supports the rhsm command, is there a way to do it? Or are we good with expectation that users should always have latest version of Anaconda.

I think this is the route to take. Once you have that, you donā€™t need upgrade at all: it simply fetches the latest version on installation.

I agree, for RHEL 9 seems to be perfect fit solution, but users provisioning Debian / Ubuntu systems still might find useful this proposed service that would do the stuff after the reboot.

Also, should this be in the RFC section?

Well it should be but RFC is under the Development where it wonā€™t (maybe) reach same number of readers as here.
Plus in past few RFC Iā€™ve been asked to move posts under here to Community section so this time I just posted it here right away.

Iā€™ve reached out via email to the Anaconda developer I had contact with back in 2019 about this. I recall that it was RHEL 8.2 or 8.3 that introduced it. Within Foreman we do have the OS major and minor versions and we can do version checks. Thatā€™s how we also dealt with RHEL 5 vs 6 vs 7 etc.

What you describe sounds a lot like the Ansible Tower provisioning callback:

To avoid going off topic Iā€™ve opened a new post:

After discussing this further with @Marek_Hulan we agreed that %post is too large. However, itā€™s isnā€™t obvious where exactly things should go.

I noted that the yum/dnf update part should go into the proper sections (when possible), like more native integration. There are probably more.

However, marking as built does make sense at first boot. That ensures the network config is actually correct. Or at least, it can route to Foreman so it could be fixed with REX/configuration management. But just moving it isnā€™t everything.

@TimoGoebel has in the past suggested to add more steps in between. Today the host status is building or built, but it would be nice if there are steps in between. If you could somehow find out that it made it past %post (or the Debian equivalent) and should have been rebooted. I donā€™t recall if this was made into an RFC or otherwise, but it would give users more insight into how far provisioning is.

So in summary:

  • %post is too large
  • Some things should move to steps before %post
  • Some things should move beyond %post
  • More fine grained statuses would be nice
3 Likes

This will also solve some issues I had to troubleshoot in the past:

  • A VM with small memory would not survive dnf update due to low memory, specifically installations without swap.
  • User-defined post template code which assumes that they are executed on a fully booted system while thatā€™s not the case for %post (e.g. firewall-cmd vs firewall-offline-cmd or what is the command I do not remember).
  • Slow provisioning, specifically when there is a lot of updates (this can be definitely done after first boot).

This does not feel like a hack, firstboot script or action is a well known term in the industry, we have seen it in the past implemented in many OSes. There is also the systemd-firstboot software which performs additional settings. Which probably means the service should start after systemd-firstboot just in case something would be uninitialized.

2 Likes

Thanks everybody for the comments and insights, I created a redmine tracker for the changes, in summary:

  1. Fix RHEL 9 issue with rhsm command
  2. Implement first boot SystemD service and cleanup %post section (and Debian equivalent)
  3. Implementing new host statuses that would reflect better hostā€™s provisioning progress
1 Like

I have created PR implementing some of the suggestions mentioned in this thread. Feel free to check it out and give me feedback.

There already is support for kicking off an ansible playbook callback after provisioning. That is what I do for some final configuration and setup and that is where we do subscription manager registration and make sure the system is updated.

How about users without Ansible? Not everybody has it.

I donā€™t see cloud-init being mentioned as an option. Foreman already has support for userdata template, and cloud-init can be leveraged to run that on boot, which could be just a bash script.

I suppose there is a bit of overlap with what foreman does during provisioning and the features of cloud-init. Bit off-topic, I wish there was a solid image based provisioning flow that covers bare-metal and VMs. That would be awesome.

I am actually working on a small prototype of exactly this, will show it off on DevConf 2023 in Brno. I haven an idea of a separate small project dedicated only to (image-based) provisioning and in some future, maybe a Foreman plugin for it. It is going to be based heavily around Anaconda (installer) and some of its capabilities (MAC HTTP headers, image-based provisioning from tarballs, UEFI HTTP Boot, SecureBoot, EFI HTTPS x509 enrollment) but any contributions for other installers (perhaps some sort of liveimage with a shellscript/python) will be welcome.

2 Likes

Thanks for the recommendation. I have looked into cloud-init, but I am not very familiar with it, so please tell me if I got something wrong.

In the current state, the built callback is happening before the machine restarts, this is solved by my PR, which creates a systemd service. I have tested cloud-init by replacing the post section with this piece of code from an existing cloud-init template.

phone_home:
  url: <%= foreman_url('built') %>
  post: []
  tries: 10

The result is the same as with my solution, but there is one thing I am a bit worried about.

Cloud-init requires systemd. This is a problem as we need to still support RHEL6, which does not have a systemd. In the case of a systemd service, you can solve that by taking the script used for the callback and making it remove itself/move to a different location or rename itself. It can be then added into a crontab which will run after reboot. I am not sure how to solve this issue with cloud-init.

I thought there was a version of cloud-init for EL6, but that lived in EPEL which is now retired.

Technically Red Hat cares about RHEL 6. In the upstream community there is less need since most EL6 is now end of life anyway.

One possible solution is to not support this feature on EL6. Depending on how far you want to take it, itā€™s certainly an option to use version conditionals.

After actually trying to implement cloud-init and talking to @lstejska, I am not sure cloud-init is the way anymore. It requires another dependency in the form of the cloud-init package. Also mixing cloud-init in the kickstart template seems like a really bad idea.

If you are familiar with cloud-init, could you draft the kickstart template changes that would allow for cloud-init to be used?

I have some experience with cloud-init, VMWare uses cloud-init in their vRealize Automation tool for spinning up clones. I am very much not impressed with it at all. Its overly complex, and not very reliable.

1 Like