Foreman UEFI Provisioning Grub Failure

Problem:
I’m provisioning bare metal HP hardware, that is using UEFI boot. I’ve not had to provision UEFI with this Foreman host before, but I’m using a process that has worked fine in the past. I have assigned the Kickstart PXEGrub2 template to the OS, and the bootloader as Grub2 UEFI This normally works fine on other foreman hosts. Then the new node is provisioning, it boots, begins the pxe process and then drops to a grub shell.

Expected outcome:
EL installation process is started from Grub post initial PXEBoot

Foreman and Proxy versions:
3.4.0

Foreman and Proxy plugin versions:

Distribution and version:
Foreman host Centos 8-stream

Other relevant data:
Reading through the logs and trying to troubleshoot the behaviour, I found across this community site Configuration Help - UEFI PXE Boot/Provision - #4 by Dirk
Which is really similar and backed up by the logs.

When I looked at the foreman host I see two grubx64.efi files in the correct locations (the foreman host is also the pxe server on this network)

/boot/efi/EFI/centos/grubx64.efi
/var/lib/tftpboot/grub2/grubx64.efi

When I look at the file sizes of these two files

-rwx------. 1 root root 2295576 Jul 19 15:30 /boot/efi/EFI/centos/grubx64.efi
-rw-r–r–. 1 root root 1893144 Nov 25 2020 /var/lib/tftpboot/grub2/grubx64.efi

there is a pretty reasonable size difference in them (similar to the thread quoted above)

I’m not sure how this could happen, as my understanding was that the grubx64.efi in the tftp root came from /boot on the foreman host, so unless this was copied in at a point in time, then later the CentOS updates put a new one in /boot that was never copied across.

Based on the thread quoted, this does seem a likely place to start for the behaviour, but before I start replacing files, I’d like to try to understand how and why these two files are so different in size.

Do you see anything in the TFTP logs? Few threads that may be helpful

they are good links and I can see some value out of it, I’m trying to get better debug out of the pxe servers request process at the moment I can only see the smart proxy interactions, which looks ‘fine’

Sep 29 13:02:09 jarvis smart-proxy[986]: 10.11.216.4 - - [29/Sep/2022:13:02:09 BST] “POST /tftp/PXELinux/f4:03:43:03:c4:28 HTTP/1.1” 200 0
Sep 29 13:02:09 jarvis smart-proxy[986]: - → /tftp/PXELinux/f4:03:43:03:c4:28
Sep 29 13:02:09 jarvis smart-proxy[986]: 10.11.216.4 - - [29/Sep/2022:13:02:09 BST] “POST /tftp/PXEGrub2/f4:03:43:03:c4:28 HTTP/1.1” 200 0
Sep 29 13:02:09 jarvis smart-proxy[986]: - → /tftp/PXEGrub2/f4:03:43:03:c4:28
Sep 29 13:02:10 jarvis smart-proxy[986]: 10.11.216.4 - - [29/Sep/2022:13:02:10 BST] “POST /tftp/PXEGrub/f4:03:43:03:c4:28 HTTP/1.1” 200 0
Sep 29 13:02:10 jarvis smart-proxy[986]: - → /tftp/PXEGrub/f4:03:43:03:c4:28
Sep 29 13:02:10 jarvis smart-proxy[986]: 10.11.216.4 - - [29/Sep/2022:13:02:10 BST] “POST /tftp/iPXE/f4:03:43:03:c4:28 HTTP/1.1” 200 0
Sep 29 13:02:10 jarvis smart-proxy[986]: - → /tftp/iPXE/f4:03:43:03:c4:28
Sep 29 13:02:17 jarvis smart-proxy[986]: 10.11.216.4 - - [29/Sep/2022:13:02:17 BST] “POST /tftp/PXELinux/f4:03:43:03:c4:28 HTTP/1.1” 200 0
Sep 29 13:02:17 jarvis smart-proxy[986]: - → /tftp/PXELinux/f4:03:43:03:c4:28
Sep 29 13:02:17 jarvis smart-proxy[986]: 10.11.216.4 - - [29/Sep/2022:13:02:17 BST] “POST /tftp/PXEGrub2/f4:03:43:03:c4:28 HTTP/1.1” 200 0
Sep 29 13:02:17 jarvis smart-proxy[986]: - → /tftp/PXEGrub2/f4:03:43:03:c4:28

and does align to the concept of the missing grub config (wrong location as your links say) as that would explain why it’s dropping back to the grub shell as there is no config to load the boot menu options.

One thing that’s worrying me (along with the inconsistent size of grubx64.efi already stated) is in the posts linked and the redmine reference, the suggestion is to create symlinks.

The first example

/var/lib/tftpboot/EFI/redhat

I don’t have an EFI directory on my tftp server

I do have

/boot/efi/EFI

on the foreman host (which is also the tftp pxe server) so the host itself has this, but it’s not in the tftp root

what part of the setup process should put this in place in the tftp root ?

We had the same symptoms. Configs were actually in the right place but there was an unescaped “&” in the kickstart url.

I had to apply this patch to the kickstart_kernel_options.erb snippet:

--- kickstart_kernel_options.erb.bak    2022-10-05 12:45:51.881297101 +0200
+++ kickstart_kernel_options.erb        2022-10-05 14:20:02.506937027 +0200
@@ -35,19 +35,19 @@
   # both current and legacy syntax provided
   if (is_fedora && os_major >= 33) || (rhel_compatible && os_major >= 9)
     if subnet4 && !subnet4.dhcp_boot_mode?
-      options.push("inst.ks=#{foreman_url('provision', static: '1')}")
+      options.push("inst.ks=#{foreman_url('provision', static: '1').gsub("&", "\\\\&")}")
     elsif subnet6 && !subnet6.dhcp_boot_mode?
-      options.push("inst.ks=#{foreman_url('provision', static6: '1')}")
+      options.push("inst.ks=#{foreman_url('provision', static6: '1').gsub("&", "\\\\&")}")
     else
-      options.push("inst.ks=#{foreman_url('provision')}", "inst.ks.sendmac")
+      options.push("inst.ks=#{foreman_url('provision').gsub("&", "\\\\&")}", "inst.ks.sendmac")
     end
   else
     if subnet4 && !subnet4.dhcp_boot_mode?
-      options.push("ks=#{foreman_url('provision', static: '1')}")
+      options.push("ks=#{foreman_url('provision', static: '1').gsub("&", "\\\\&")}")
     elsif subnet6 && !subnet6.dhcp_boot_mode?
-      options.push("ks=#{foreman_url('provision', static6: '1')}")
+      options.push("ks=#{foreman_url('provision', static6: '1').gsub("&", "\\\\&")}")
     else
-      options.push("ks=#{foreman_url('provision')}", "kssendmac", "ks.sendmac")
+      options.push("ks=#{foreman_url('provision').gsub("&", "\\\\&")}", "kssendmac", "ks.sendmac")
     end
   end
2 Likes

Can you share where the & was? Was it in some value?

Our Grub entries are looking like this with the patch applied (omitted some network config). The issue became apparent after updating Foreman from 3.2 to 3.4.

menuentry 'Kickstart default PXEGrub2' {
  linuxefi boot/rocky-8-local-G2mekSQiNgu5-vmlinuz  BOOTIF=01-b4-2e-99-b4-c6-58 ks=http://<foreman-fqdn>/unattended/provision?static=1\&token=31c659f0-7338-4c01-a62a-817c30729df5
  initrdefi boot/rocky-8-local-G2mekSQiNgu5-initrd.img
}

had same problem ( post here: Cant provision linux hosts after upgrade to 3.3.1 or 3.4 )

Tried to update and tested your patch - provisioning now works after update.

Would be fantastic if this could be fixed properly in official channels

Thanks for the patch! (Your Github PR led me here)
Instead of escaping just the ampersand, would it be more future proof if we just quote the whole foreman_url? Also see ekohl’s message on foreman_url_renderer.rb - maybe there is a cleaner and more future proof way to fix it?

1 Like

is there a redmine issue that is tracking this problem ?

I’m not aware but then again I’m new to this community so I could be wrong.
I’ve tried (single) quoting the foreman_url() output in kickstart_kernel_options.erb only the find that breaks PXELinux :roll_eyes: maybe .gsub() maybe not be a bad idea after all? Will need to do that to all the Grub2 templates however.

Another report: Boot disk with UEFI not working in Katello - #3 by angry_yak_shaver