Recently I investigated a bug, where a host could not be provisioned, if kickstart_liveimg parameter was specified.
The steps to reproduce the bug:
Setup a live image so it will be accessible by the newly created host
Define all the needed properties for the host, like compute resources, content, networking e.t.c.
Set the provisioning to pxe-grub2
Add a host parameter with the name kickstart_liveimg and set the value to the URL of the live image ou have set up in step 1
Submit the host form.
The console of the provisione host will show you an error that the inst.stage2 kernel parameter is not set.
Long story short, this is happening because the parameter is not saved yet during the provisioning orchestration steps.
While there is a workaround for this issue to set up the parameter on hostgroup or any other grouping object that is related to this host, the issue still needs fixing.
The ideal fix would be of course separating the provisioning from the active record callbacks, as was discussed in Foreman provisioning strategy.
I would like to hear suggestions about how to try and solve it without refactoring the whole provisioning structure.
I’ll give the relevant bits. The orchestration relies on Active Record callbacks. This specific part happens on the Host model. It gets triggered after the host is created. So the Orchestration::TFTP part creates a TFTP config on the Smart Proxy. This can’t be rendered dynamically. At least, not in the current form.
The problem appears to be that after the host is created it will create the host parameters. By that time the TFTP orchestration already ran and isn’t triggered again.
That’s why creating them on the another layer works as a workaround. Another is to move the host out of build mode and back into it.
I considered invoking some orchestration steps after parameters change, but I worry it’ll be way too expensive.
We need to be careful with the after_commit hook, since it will mean some of the transient information will be stored already, like mac address for VMs (that is added when a VM is created). Of course we can try and add more compensation code, if the orchestration fails eventually.
Also curious if there has been a change/fix for this, as seeing this with EC2 builds that involve a lot of custom host parameters on provision. It looks like this same issue, where looking in the UI and viewing the Finish script has all the parameters, but the host itself looks like any logic that required a host_parameter that gets added on build is not set, so values are missing. logic is failing and the UI and API return Failed to launch script on {fqdn}: undefined method ’ for nil:NilClass` which may be another issue, but the fact that none of the host_parameters that get set on build are geting consumed doesn’t help anything.
This does seem like a bug with 3.16, I just rolled to 3.15 and I don’t see the failed to launch script error, nor do I need to put all the host parameters into the host_group, so something must have changed around this for 3.16. This could be two different issues, the host_parameters not getting rendered, and ssh provisioning using a finish script returning that failed to launch message (even though the script actually executes fully just fine, once the parameters are included in the host_group instead of on the host on provision)
Is there any plans to revert the change requiring the hostgroup workaround for this? As it’s a change in 3.16 that doesn’t seem to be required in 3.15.
Oh I did that, I’m talking about the need in this thread, to have to have all the build parameters moved into the hostgroup in order for them to be properly consumed on provision. I’m wondering if the expectation is that this is the new norm, and host build parameters are no longer a functional method to use when building / provisioning out hosts, due to a workaround available.
That workaround has always been needed for this particular bug. Parameters that affect how the TFTP record is created need to exist. It’s just that very few parameters exist in that template. For example, I bet you can also trigger it with these 2.
It’s just very rare to use parameters to write out the PXE configuration. The actual files are usually rendered dynamically, like the actual kickstart, and that’s unaffected by this.
I don’t think thats true at all. In Foreman 3.15 and earlier, when building out an AWS based EC2 host, we can pass in host parameters on the build for things like volume mapping, keys, iam roles, etc. With Foreman 3.16, the template shows it rendered the values correctly when viewing the template in the build, but the actual SSH finish script doesn’t have any of the values, and any logic based on host parameters that are not global settings are ignored.
If, however, I load all these host parameters on build to be in the hostgroup instead, they get included properly in the template. So this has never been the case since I think 2.x when we started with Foreman, and have only seen this now with this release.
Perhaps this is a different issue, as I’m seeing this with all builds, libvirt, discovery and aws (our 3 build methods) and I’m not sure if this thread was referring to a specific scenario, but what we are seeing in 3.16 is this is all host_params on build are not included in the finish scripts.