Provisioning hosts doesnt work well with host parameters

Shimon_Shtein · September 9, 2025, 10:26am

Recently I investigated a bug, where a host could not be provisioned, if kickstart_liveimg parameter was specified.
The steps to reproduce the bug:

Setup a live image so it will be accessible by the newly created host
Define all the needed properties for the host, like compute resources, content, networking e.t.c.
Set the provisioning to pxe-grub2
Add a host parameter with the name kickstart_liveimg and set the value to the URL of the live image ou have set up in step 1
Submit the host form.

The console of the provisione host will show you an error that the inst.stage2 kernel parameter is not set.

Long story short, this is happening because the parameter is not saved yet during the provisioning orchestration steps.
While there is a workaround for this issue to set up the parameter on hostgroup or any other grouping object that is related to this host, the issue still needs fixing.

The ideal fix would be of course separating the provisioning from the active record callbacks, as was discussed in Foreman provisioning strategy.
I would like to hear suggestions about how to try and solve it without refactoring the whole provisioning structure.

ekohl · September 9, 2025, 10:42am

This is an extremely short version of it. Slightly longer:

The Foreman orchestration has been around for a long time. At my request @ehelms recently experimented with AI summarizing the code we have so you can take a look at foreman-context/foreman/foreman-provisioning-orchestration-design.md at develop · ehelms/foreman-context · GitHub.

I’ll give the relevant bits. The orchestration relies on Active Record callbacks. This specific part happens on the Host model. It gets triggered after the host is created. So the Orchestration::TFTP part creates a TFTP config on the Smart Proxy. This can’t be rendered dynamically. At least, not in the current form.

The problem appears to be that after the host is created it will create the host parameters. By that time the TFTP orchestration already ran and isn’t triggered again.

That’s why creating them on the another layer works as a workaround. Another is to move the host out of build mode and back into it.

I considered invoking some orchestration steps after parameters change, but I worry it’ll be way too expensive.

Looking at Active Record Callbacks — Ruby on Rails Guides I think this may be the solution. If I have it right then we already trigger it on after_commit:

github.com/theforeman/foreman

app/models/host/managed.rb

7027d014c


      
          end
          
          # Define custom hook that can be called in model by magic methods (before, after, around)
          define_model_callbacks :build, :only => :after
          define_model_callbacks :provision, :only => :before
          prepend Hostext::UINotifications
          
          before_validation :refresh_build_status, :if => :build_changed?
          
          # Custom hooks will be executed after_commit
          after_commit :build_hooks
          before_save :clear_data_on_build
          
          include PxeLoaderValidator
          
          def initialize(attributes = nil, &block)
            super(apply_inherited_attributes(attributes, false), &block)
          end
          
          def build_hooks
            return if previous_changes['build'].nil?

That suggests the host isn’t created in a single transaction. That’s where I’d suggest to look next.

Shimon_Shtein · September 9, 2025, 10:48am

We need to be careful with the after_commit hook, since it will mean some of the transient information will be stored already, like mac address for VMs (that is added when a VM is created). Of course we can try and add more compensation code, if the orchestration fails eventually.

ekohl · September 9, 2025, 12:29pm

We’ve been using after_commit since forever so that was a pleasant surprise.

Shimon_Shtein · September 9, 2025, 3:24pm

OK, did some digging: most of our provisioning is done in around_save hook, but we can also perform actions in the after_commit scope:

github.com/theforeman/foreman

app/models/concerns/orchestration/ssh_provision.rb

7027d014c


      
          
          protected
          
          def queue_ssh_provision
            return unless ssh_provision? && errors.empty?
            new_record? ? queue_ssh_provision_create : queue_ssh_provision_update
          end
          
          # I guess this is not going to happen on create as we might not have an ip address yet.
          def queue_ssh_provision_create
            post_queue.create(:name   => _("Prepare post installation script for %s") % self, :priority => 2000,
              :action => [self, :setSSHProvisionScript])
            post_queue.create(:name   => _("Wait for %s to come online") % self, :priority => 2001,
              :action => [self, :setSSHWaitForResponse])
            post_queue.create(:name   => _("Configure instance %s via SSH") % self, :priority => 2003,
              :action => [self, :setSSHProvision])
          end
          
          def queue_ssh_provision_update
          end

Maybe just moving the save queue into the post queue will do the trick (or just moving the TFTP actions there).

Shimon_Shtein · September 11, 2025, 9:22am

Just thought of another option to fix this: we can teach host_parameter macro to take into account unsaved parameters.

Bernhard_Suttner · September 29, 2025, 1:13pm

Was there a solution for this issue? If yes, can you please link the PR?

cody_c · October 3, 2025, 5:35pm

Also curious if there has been a change/fix for this, as seeing this with EC2 builds that involve a lot of custom host parameters on provision. It looks like this same issue, where looking in the UI and viewing the Finish script has all the parameters, but the host itself looks like any logic that required a host_parameter that gets added on build is not set, so values are missing. logic is failing and the UI and API return Failed to launch script on {fqdn}: undefined method ’ for nil:NilClass` which may be another issue, but the fact that none of the host_parameters that get set on build are geting consumed doesn’t help anything.

cody_c · October 3, 2025, 9:14pm

This does seem like a bug with 3.16, I just rolled to 3.15 and I don’t see the failed to launch script error, nor do I need to put all the host parameters into the host_group, so something must have changed around this for 3.16. This could be two different issues, the host_parameters not getting rendered, and ssh provisioning using a finish script returning that failed to launch message (even though the script actually executes fully just fine, once the parameters are included in the host_group instead of on the host on provision)

ekohl · October 3, 2025, 10:29pm

No, in https://github.com/theforeman/foreman-documentation/pull/3958 it was worked around by applying the parameters to the hostgroup.

ekohl · October 3, 2025, 10:29pm

That looks like an unrelated issue. Please open a new thread with the full stack trace included.

cody_c · October 6, 2025, 4:14pm

Is there any plans to revert the change requiring the hostgroup workaround for this? As it’s a change in 3.16 that doesn’t seem to be required in 3.15.

ekohl · October 7, 2025, 10:29am

@cody_c like I said, you have an unrelated issue and please open a new thread for it.

cody_c · October 7, 2025, 3:14pm

Oh I did that, I’m talking about the need in this thread, to have to have all the build parameters moved into the hostgroup in order for them to be properly consumed on provision. I’m wondering if the expectation is that this is the new norm, and host build parameters are no longer a functional method to use when building / provisioning out hosts, due to a workaround available.

ekohl · October 7, 2025, 3:38pm

That workaround has always been needed for this particular bug. Parameters that affect how the TFTP record is created need to exist. It’s just that very few parameters exist in that template. For example, I bet you can also trigger it with these 2.

github.com/theforeman/foreman

app/views/unattended/provisioning_templates/PXEGrub2/kickstart_default_pxegrub2.erb

c976c4b6b


      
          set default=<%= host_param('default_grub_install_entry') || 0 %>
          set timeout=<%= host_param('loader_timeout') || 10 %>

It’s just very rare to use parameters to write out the PXE configuration. The actual files are usually rendered dynamically, like the actual kickstart, and that’s unaffected by this.

cody_c · October 7, 2025, 3:57pm

I don’t think thats true at all. In Foreman 3.15 and earlier, when building out an AWS based EC2 host, we can pass in host parameters on the build for things like volume mapping, keys, iam roles, etc. With Foreman 3.16, the template shows it rendered the values correctly when viewing the template in the build, but the actual SSH finish script doesn’t have any of the values, and any logic based on host parameters that are not global settings are ignored.

If, however, I load all these host parameters on build to be in the hostgroup instead, they get included properly in the template. So this has never been the case since I think 2.x when we started with Foreman, and have only seen this now with this release.

cody_c · October 7, 2025, 3:59pm

Perhaps this is a different issue, as I’m seeing this with all builds, libvirt, discovery and aws (our 3 build methods) and I’m not sure if this thread was referring to a specific scenario, but what we are seeing in 3.16 is this is all host_params on build are not included in the finish scripts.

Bernhard_Suttner · November 28, 2025, 10:00pm

I think, the described issue happens nowadays with all kinds of image based deployment (finish + user-data / cloud_init deployment).

It can be “solved” (= workaround) to use a hostgroup or any other parameter besided host parameter.

So, if you set a activation key; enable-puppet8; ntp-server host parameter and try to create a host using image based deployment - it will not work. Pretty bad also, if you try to create a host with subscription-manager registration and set the kt_activation_keys host parameter for that. It will simply to no register.

I wonder how to fix this withoug moving everything to foreman-task / or every other huge re-implementation task.

@Shimon_Shtein can you let me know, what you think about handle unsaved parameters?

ekohl · November 28, 2025, 11:48pm

You’re right that the problem is probably quite a bit bigger than we thought.

I think this may actually be the safest option, and something we’ve been wanting for a long time already. But also a huge thing. First of all, foreman-tasks is a plugin and provisioning is in core. That’s why we have talked about moving the functionality into core.

This whole issue is a can of worms. So many thing can go wrong if you start trying to fix it.

Bernhard_Suttner · November 29, 2025, 8:36am

I know exactly what you mean. Unfortunately, we recognize, that with foreman 3.14 and the related katello, image based deployment + activation key as host parameter fully worked. Means, the host was provisioned and is registered using the given host parameter activation key.

With foreman 3.16 this is now broken. Even with the latest fix from Fixes #38910 - Return correct template kind without compute attrs · theforeman/foreman@4e9798e · GitHub the template renders again but the host parameter are not available => no activation key => no registration.