RFC - Systemd first boot service for host provisioning

Hi,
Right now we are having an issue with provisioning RHEL 9 machines, where provisioned hosts go into emergency mode after initial reboot.

The problem is in running package upgrade in the %post section in the Kickstart template. Luckily we have temporary workaround for it - disable package upgrade and do it after host is rebooted and provisioned.

This leads us to this RFC, where I would like to propose to add a new one shot Systemd service foreman_first_boot that would run on first machine start and execute a template defined by user.

Service template could contain package upgrades and some other stuff that we do right now in the %post section.

SystemD adoption (source)

OS Version Year
CentOS 7.0 2014
RHEL 7.0 2014
Debian 8.0 2015
Ubuntu 15.04 2015

Service example

[Unit]
Description=foreman_first_boot

[Service]
Type=oneshot
ExecStart=/etc/foreman/foreman_first_boot.sh
RemainAfterExit=false
Wants=basic.target
After=basic.target network-online.target nss-lookup.target
ConditionPathExists=!/etc/foreman/foreman_first_boot_done

My first impression is that we’re building more hacks. Instead I’d prefer a completely different approach.

I think we should avoid using %post as much as possible. It was always a hack and we can blame RHEL subscriptions for most of it.

The backstory is that when you provision RHEL you need a subscription. However, there was no way to get that subscription enabled in the regular install. So there’s a hack in %post. Then you can install additional packages instead of using %packages. This was never needed for Fedora and CentOS, but for compatibility we used a single way.

The good news is that the support has landed in Anaconda:

So things will improve a lot if we start using those macros when it’s available.

I think this is the route to take. Once you have that, you don’t need upgrade at all: it simply fetches the latest version on installation.

See Performing an advanced RHEL 8 installation Red Hat Enterprise Linux 8 | Red Hat Customer Portal as well.

Also, should this be in the RFC section?

1 Like

I took a stab at writing the very initial example:

It’s far from useful now and nowhere near complete, but I think it shows a much more elegant way. This has been on my radar for the past 3 years, but I never got around to it.

@ekohl are there some other things we do in %post that perhaps don’t have a native anaconda support? What I like about the proposal here is, that the call home curl happens only after the host reboot. So in the Foreman, host is considered built only when it’s really successfully provisioned.

rhsm command looks promising, that would actually solved the issue. My question is, that we should somehow check if installed Anaconda actually supports the rhsm command, is there a way to do it? Or are we good with expectation that users should always have latest version of Anaconda.

I think this is the route to take. Once you have that, you don’t need upgrade at all: it simply fetches the latest version on installation.

I agree, for RHEL 9 seems to be perfect fit solution, but users provisioning Debian / Ubuntu systems still might find useful this proposed service that would do the stuff after the reboot.

Also, should this be in the RFC section?

Well it should be but RFC is under the Development where it won’t (maybe) reach same number of readers as here.
Plus in past few RFC I’ve been asked to move posts under here to Community section so this time I just posted it here right away.

I’ve reached out via email to the Anaconda developer I had contact with back in 2019 about this. I recall that it was RHEL 8.2 or 8.3 that introduced it. Within Foreman we do have the OS major and minor versions and we can do version checks. That’s how we also dealt with RHEL 5 vs 6 vs 7 etc.

What you describe sounds a lot like the Ansible Tower provisioning callback:

To avoid going off topic I’ve opened a new post:

After discussing this further with @Marek_Hulan we agreed that %post is too large. However, it’s isn’t obvious where exactly things should go.

I noted that the yum/dnf update part should go into the proper sections (when possible), like more native integration. There are probably more.

However, marking as built does make sense at first boot. That ensures the network config is actually correct. Or at least, it can route to Foreman so it could be fixed with REX/configuration management. But just moving it isn’t everything.

@TimoGoebel has in the past suggested to add more steps in between. Today the host status is building or built, but it would be nice if there are steps in between. If you could somehow find out that it made it past %post (or the Debian equivalent) and should have been rebooted. I don’t recall if this was made into an RFC or otherwise, but it would give users more insight into how far provisioning is.

So in summary:

  • %post is too large
  • Some things should move to steps before %post
  • Some things should move beyond %post
  • More fine grained statuses would be nice
2 Likes

This will also solve some issues I had to troubleshoot in the past:

  • A VM with small memory would not survive dnf update due to low memory, specifically installations without swap.
  • User-defined post template code which assumes that they are executed on a fully booted system while that’s not the case for %post (e.g. firewall-cmd vs firewall-offline-cmd or what is the command I do not remember).
  • Slow provisioning, specifically when there is a lot of updates (this can be definitely done after first boot).

This does not feel like a hack, firstboot script or action is a well known term in the industry, we have seen it in the past implemented in many OSes. There is also the systemd-firstboot software which performs additional settings. Which probably means the service should start after systemd-firstboot just in case something would be uninitialized.

2 Likes

Thanks everybody for the comments and insights, I created a redmine tracker for the changes, in summary:

  1. Fix RHEL 9 issue with rhsm command
  2. Implement first boot SystemD service and cleanup %post section (and Debian equivalent)
  3. Implementing new host statuses that would reflect better host’s provisioning progress
1 Like