Provisioning baremetal via Grub PXE2, config file issue

Problem:

When I create a host in foreman to be provisioned with PXE UEFI method the installation hangs at a grub menu. In /var/log/messages on the foreman smart proxy I see the following error messages:

XX in.tftpd: RRQ from 10.0.1.2 filename grub2/grubx64.efi
xx in.tftpd: Error code 8: User aborted the transfer
xx in.tftpd: RRQ from 10.0.1.2 filename grub2/grubx64.efi
xx in.tftpd: Client 10.0.1.2 finished grub2/grubx64.efi
xx in.tftpd: RRQ from 10.0.1.2 filename /EFI/centos/grub.cfg-XX-XX-XX-XX-XX-XX-XX
XX in.tftpd: Client 10.0.1.2 finished /EFI/centos/grub.cfg-XX-XX-XX-XX-XX-XX-XX
XX in.tftpd: RRQ from 10.0.1.2 filename /EFI/centos/grub.cfg-XX-XX-XX-XX-XX-XX-XX
XX in.tftpd: Client 10.0.1.2 File not found /EFI/centos/grub.cfg-XX-XX-XX-XX-XX-XX-XX

If I manually create the path /var/lib/tftpboot/EFI/centos and copy the grub.cfg-MAC file from /var/lib/tftpboot/grub2/ to /var/lib/tftpboot/EFI/centos/ the install works. If I symlink grub2 into TFTPPATH/EFI it doesn’t work either.

Expected outcome:

PXE boot and KS install should proceed.

Foreman and Proxy versions:

1.24.2

Foreman and Proxy plugin versions:

Distribution and version:

CentOS Linux release 7.7.1908

Other relevant data:

Hello and welcome. Weird, I haven’t seen this behavior yet. Grub has incorrect root path set, can share the DHCP response, specifically the next-server and filename options? I’d assume this would be simply set by Foreman to 10.0.1.2 and grub2/grubx64.efi respectively, but let’s check.

Also, do you use the default DHCP and TFTP servers (ISC) from CentOS? Any custom configuration flags?

Finally, where is that grubx64.efi coming from? Have you deployed it with our installer (which copies it from /boot/EFI/centos) or differently? Can you compare with md5sum the two guys? Ideally you should have a copy which works fine and it is also signed so SecureBoot would work. But alternatively, the grub can be “compiled” via grub-mknetdir and this command allows specifying root directory on the server which could override what we expect.

I’ve worked with grub developers recently fixing some bugs in regard to similar (but not the same) problems, can you download the latest grub from Fedora Rawhide, extract grubx64.efi from grub2-efi package and put it into your TFTP folder:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1488130

Then try again.

Symlink would only work if it’s relative because TFTP server chroots the environment. If you do this, you can have a good workaround until we figure this out.

Grub (at least the one from Red Hat systems) is expected to try to load its configuration from /grub2/ path and /EFI/centos as well. Can you check carefully your logs? Is there a chance you have missed those lines?

Those errors you pasted can be ignored, this is how grub2 behaves. We use grub2 which is signed by CentOS / Red Hat to enable SecureBoot. If you don’t use SB you can build grub2 using grub-mkimage and provide a correct base path so it will not try to load from this directory.

Apr 14 12:28:46 xxxx in.tftpd[1499657]: RRQ from 192.168.122.107 filename grub2/grub.cfg-01-52-54-00-2c-dc-9e
Apr 14 12:28:46 xxxx in.tftpd[1499657]: Client 192.168.122.107 File not found grub2/grub.cfg-01-52-54-00-2c-dc-9e
Apr 14 12:28:46 xxxx in.tftpd[1499658]: RRQ from 192.168.122.107 filename grub2/grub.cfg-C0A87A6B
Apr 14 12:28:46 xxxx in.tftpd[1499658]: Client 192.168.122.107 File not found grub2/grub.cfg-C0A87A6B
Apr 14 12:28:46 xxxx in.tftpd[1499659]: RRQ from 192.168.122.107 filename grub2/grub.cfg-C0A87A6
Apr 14 12:28:46 xxxx in.tftpd[1499659]: Client 192.168.122.107 File not found grub2/grub.cfg-C0A87A6

Yes I can confirm that foreman is returning the IP 10.0.1.2 and asking it to pull grub2/grubx64.efi.

That file is the file from /boot/EFI/centos, I did an md5sum and it has the same hash.

I will try extracting that grubx64.efi file from the link.

What happens is it attempts to grab cfg files from /EFI/centos, then doesn’t find it so it gives up and just halts at a grub> prompt.

I am provisioning RHEL 7.7 but the last time I tried this with RHEL 7.6 or CentOS 7.6 it fetched the config file from /grub2/ correctly.

So here you have it, RHEL 7.7 running Satellite 6.7 which is essentially CentOS 7.7 running Foreman 1.24:

Apr 16 13:30:15 sat67.nat.lan dhcpd[6164]: DHCPDISCOVER from 52:54:00:3c:3c:3c via eth0
Apr 16 13:30:16 sat67.nat.lan dhcpd[6164]: DHCPOFFER on 192.168.199.12 to 52:54:00:3c:3c:3c via eth0
Apr 16 13:30:19 sat67.nat.lan dhcpd[6164]: DHCPREQUEST for 192.168.199.12 (192.168.199.14) from 52:54:00:3c:3c:3c via eth0
Apr 16 13:30:19 sat67.nat.lan dhcpd[6164]: DHCPACK on 192.168.199.12 to 52:54:00:3c:3c:3c via eth0
Apr 16 13:30:19 sat67.nat.lan in.tftpd[12234]: RRQ from 192.168.199.12 filename grub2/shim.efi
Apr 16 13:30:19 sat67.nat.lan in.tftpd[12234]: Error code 8: User aborted the transfer
Apr 16 13:30:19 sat67.nat.lan in.tftpd[12235]: RRQ from 192.168.199.12 filename grub2/shim.efi
Apr 16 13:30:19 sat67.nat.lan in.tftpd[12235]: Client 192.168.199.12 finished grub2/shim.efi
Apr 16 13:30:19 sat67.nat.lan in.tftpd[12236]: RRQ from 192.168.199.12 filename grub2/grubx64.efi
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12236]: Client 192.168.199.12 finished grub2/grubx64.efi
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12237]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-01-52-54-00-3c-3c-3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12237]: Client 192.168.199.12 File not found /grub2/grub.cfg-01-52-54-00-3c-3c-3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12238]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0A8C70C
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12238]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0A8C70C
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12239]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0A8C70
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12239]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0A8C70
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12240]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0A8C7
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12240]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0A8C7
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12241]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0A8C
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12241]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0A8C
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12242]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0A8
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12242]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0A8
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12243]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0A
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12243]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0A
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12244]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C0
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12244]: Client 192.168.199.12 File not found /grub2/grub.cfg-C0
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12245]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-C
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12245]: Client 192.168.199.12 File not found /grub2/grub.cfg-C
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12246]: RRQ from 192.168.199.12 filename /grub2/grub.cfg
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12246]: Client 192.168.199.12 finished /grub2/grub.cfg
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12247]: RRQ from 192.168.199.12 filename /EFI/redhat/x86_64-efi/command.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12247]: Client 192.168.199.12 File not found /EFI/redhat/x86_64-efi/command.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12248]: RRQ from 192.168.199.12 filename /EFI/redhat/x86_64-efi/fs.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12248]: Client 192.168.199.12 File not found /EFI/redhat/x86_64-efi/fs.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12249]: RRQ from 192.168.199.12 filename /EFI/redhat/x86_64-efi/crypto.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12249]: Client 192.168.199.12 File not found /EFI/redhat/x86_64-efi/crypto.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12250]: RRQ from 192.168.199.12 filename /EFI/redhat/x86_64-efi/terminal.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12250]: Client 192.168.199.12 File not found /EFI/redhat/x86_64-efi/terminal.lst
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12251]: RRQ from 192.168.199.12 filename /grub2/grub.cfg
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12251]: Client 192.168.199.12 finished /grub2/grub.cfg
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12252]: RRQ from 192.168.199.12 filename /httpboot/grub2/grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12252]: Client 192.168.199.12 File not found /httpboot/grub2/grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12253]: RRQ from 192.168.199.12 filename /grub2/grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12253]: Client 192.168.199.12 File not found /grub2/grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12254]: RRQ from 192.168.199.12 filename grub2/grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12254]: Client 192.168.199.12 File not found grub2/grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12255]: RRQ from 192.168.199.12 filename grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:20 sat67.nat.lan in.tftpd[12255]: Client 192.168.199.12 File not found grub.cfg-52:54:00:3c:3c:3c
Apr 16 13:30:24 sat67.nat.lan in.tftpd[12259]: RRQ from 192.168.199.12 filename boot/fdi-image/vmlinuz0
Apr 16 13:30:26 sat67.nat.lan in.tftpd[12259]: Client 192.168.199.12 finished boot/fdi-image/vmlinuz0
Apr 16 13:30:26 sat67.nat.lan in.tftpd[12269]: RRQ from 192.168.199.12 filename boot/fdi-image/initrd0.img

As you can see, the grub loads /grub2/grub.cfg just fine, then in the default template we have couple of insmod commands for Debian users (on Red Hats we install grub in single monolothic binary so no modules need to be loaded) and it does try to find listing files from /EFI/redhat however it all ends with file not found error and it continues from there.

What I can offer at this point is I took grub from RHEL 7.7 and uploaded it here for you:

http://people.redhat.com/~lzapleta/temp/grubx64.efi

Can you try once again with this build? Pastebin whole TFTP logs from top to bottom.

Thanks Lukas, I grabbed the .efi file from the URL, replaced the existing file and ran a PXE boot again. This is what I see in my tftp log. This is all I can share with you from that.

Apr 17 09:11:08 foreman02 dhcpd: DHCPDISCOVER from XX:XX:XX:XX:XX:XX via eth0
Apr 17 09:11:08 foreman02 dhcpd: DHCPOFFER on 10.0.0.231 to XX:XX:XX:XX:XX:XX via eth0
Apr 17 09:11:40 foreman02 dhcpd: Dynamic and static leases present for 10.0.0.231.
Apr 17 09:11:40 foreman02 dhcpd: Remove host declaration anna-keemer.nuc.local or remove 10.0.0.231
Apr 17 09:11:40 foreman02 dhcpd: from the dynamic address pool for 10.0.44.224/27
Apr 17 09:11:40 foreman02 dhcpd: DHCPREQUEST for 10.0.0.231 (10.0.44.226) from XX:XX:XX:XX:XX:XX via eth0
Apr 17 09:11:40 foreman02 dhcpd: DHCPACK on 10.0.0.231 to XX:XX:XX:XX:XX:XX via eth0
Apr 17 09:11:40 foreman02 in.tftpd[29774]: RRQ from 10.0.0.231 filename grub2/grubx64.efi
Apr 17 09:11:40 foreman02 in.tftpd[29774]: Error code 8: User aborted the transfer
Apr 17 09:11:40 foreman02 in.tftpd[29775]: RRQ from 10.0.0.231 filename grub2/grubx64.efi
Apr 17 09:11:41 foreman02 in.tftpd[29775]: Client 10.0.0.231 finished grub2/grubx64.efi
Apr 17 09:12:20 foreman02 in.tftpd[29776]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-01-XX-XX-XX-XX-XX-XX
Apr 17 09:12:20 foreman02 in.tftpd[29776]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-01-XX-XX-XX-XX-XX-XX
Apr 17 09:12:20 foreman02 in.tftpd[29777]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A002CE7
Apr 17 09:12:20 foreman02 in.tftpd[29777]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A002CE7
Apr 17 09:12:20 foreman02 in.tftpd[29778]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A002CE
Apr 17 09:12:20 foreman02 in.tftpd[29778]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A002CE
Apr 17 09:12:20 foreman02 in.tftpd[29779]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A002C
Apr 17 09:12:20 foreman02 in.tftpd[29779]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A002C
Apr 17 09:12:20 foreman02 in.tftpd[29780]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A002
Apr 17 09:12:20 foreman02 in.tftpd[29780]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A002
Apr 17 09:12:20 foreman02 in.tftpd[29781]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A00
Apr 17 09:12:20 foreman02 in.tftpd[29781]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A00
Apr 17 09:12:20 foreman02 in.tftpd[29782]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A0
Apr 17 09:12:20 foreman02 in.tftpd[29782]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A0
Apr 17 09:12:20 foreman02 in.tftpd[29783]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0A
Apr 17 09:12:20 foreman02 in.tftpd[29783]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0A
Apr 17 09:12:20 foreman02 in.tftpd[29784]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg-0
Apr 17 09:12:20 foreman02 in.tftpd[29784]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg-0
Apr 17 09:12:20 foreman02 in.tftpd[29785]: RRQ from 10.0.0.231 filename /EFI/redhat/grub.cfg
Apr 17 09:12:20 foreman02 in.tftpd[29785]: Client 10.0.0.231 File not found /EFI/redhat/grub.cfg
Apr 17 09:12:20 foreman02 in.tftpd[29786]: RRQ from 10.0.0.231 filename /EFI/redhat/x86_64-efi/command.lst
Apr 17 09:12:20 foreman02 in.tftpd[29786]: Client 10.0.0.231 File not found /EFI/redhat/x86_64-efi/command.lst
Apr 17 09:12:20 foreman02 in.tftpd[29787]: RRQ from 10.0.0.231 filename /EFI/redhat/x86_64-efi/fs.lst
Apr 17 09:12:20 foreman02 in.tftpd[29787]: Client 10.0.0.231 File not found /EFI/redhat/x86_64-efi/fs.lst
Apr 17 09:12:20 foreman02 in.tftpd[29788]: RRQ from 10.0.0.231 filename /EFI/redhat/x86_64-efi/crypto.lst
Apr 17 09:12:20 foreman02 in.tftpd[29788]: Client 10.0.0.231 File not found /EFI/redhat/x86_64-efi/crypto.lst
Apr 17 09:12:20 foreman02 in.tftpd[29789]: RRQ from 10.0.0.231 filename /EFI/redhat/x86_64-efi/terminal.lst
Apr 17 09:12:20 foreman02 in.tftpd[29789]: Client 10.0.0.231 File not found /EFI/redhat/x86_64-efi/terminal.lst

As you see it doesn’t look in grub2/ for the grub cfg files.

On the machine I am trying to pxe boot it goes to a grub> prompt. If i switch to a legacy pxe method using pxelinux.* it works just fine.

What kind of hardware is this? I assume you are trying with EFI mode, PXE boot method (not HTTP UEFI Boot). Can you confirm?

An intel NUC. Yes EFI mode, PXE and its NOT HTTP.

At this point, I can only recommend upgrading firmware on NUC. If that does not help, create a relative symlink on the TFTP server: /var/lib/tftpboot/EFI/redhat -> ../../grub2 to workaround the issue.

I talked to a grub developer and he said there was a race condition recently fixed:

Can you try with grub2 from Rawhide?

Bump, would you mind spending some more time with this? It looks like it can be an environmental issue we are not able to reproduce virtually.

Hi @lzap,

I confirm a race condition of type grub.cfg between httpboot and the other one; if this linked …
This test was carried out on a Bare Metal I tested today:

tcpdump -i enp94s0f0np0 port tftp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp94s0f0np0, link-type EN10MB (Ethernet), capture size 262144 bytes
22:01:44.926498 IP 10.3.8.14.1806 > maas.sas.local.tftp: 47 RRQ “grub2/grubx64.efi” octet tsize 0 blksize 1468
22:01:44.935408 IP 10.3.8.14.1807 > maas.sas.local.tftp: 39 RRQ “grub2/grubx64.efi” octet blksize 1468
22:01:45.027284 IP 10.3.8.14.25300 > maas.sas.local.tftp: 58 RRQ “grub2/x86_64-efi/command.lst” octet blksize 1024 tsize 0
22:01:45.027961 IP 10.3.8.14.25301 > maas.sas.local.tftp: 53 RRQ “grub2/x86_64-efi/fs.lst” octet blksize 1024 tsize 0
22:01:45.028671 IP 10.3.8.14.25302 > maas.sas.local.tftp: 57 RRQ “grub2/x86_64-efi/crypto.lst” octet blksize 1024 tsize 0
22:01:45.029419 IP 10.3.8.14.25303 > maas.sas.local.tftp: 59 RRQ “grub2/x86_64-efi/terminal.lst” octet blksize 1024 tsize 0
22:01:45.030126 IP 10.3.8.14.25304 > maas.sas.local.tftp: 44 RRQ “grub2/grub.cfg” octet blksize 1024 tsize 0
22:01:45.039045 IP 10.3.8.14.25305 > maas.sas.local.tftp: 72 RRQ “/httpboot/grub2/grub.cfg-4c:d9:8f:ba:36:3b” octet blksize 1024 tsize 0
22:01:45.047262 IP 10.3.8.14.25306 > maas.sas.local.tftp: 63 RRQ “/grub2/grub.cfg-4c:d9:8f:ba:36:3b” octet blksize 1024 tsize 0
22:01:45.056114 IP 10.3.8.14.25307 > maas.sas.local.tftp: 72 RRQ “/httpboot/grub2/grub.cfg-4c:d9:8f:ba:36:3b” octet blksize 1024 tsize 0
22:01:45.064412 IP 10.3.8.14.25308 > maas.sas.local.tftp: 63 RRQ “/grub2/grub.cfg-4c:d9:8f:ba:36:3b” octet blksize 1024 tsize 0
22:01:45.073312 IP 10.3.8.14.25309 > maas.sas.local.tftp: 72 RRQ “/httpboot/grub2/grub.cfg-4c:d9:8f:ba:36:3b” octet blksize 1024 tsize 0
22:01:45.081445 IP 10.3.8.14.25310 > maas.sas.local.tftp: 63 RRQ “/grub2/grub.cfg-4c:d9:8f:ba:36:3b” octet blksize 1024 tsize 0

I confirm that HTTP Boot Proxy is disabled in my subnet:

I know how to hack this issue but would prefer not to … lol

Regards,

@Gueug78400

I still want to deploy Ubuntu 18.04.5 …
No other DHCP server on subnet …

You can deploy what you want with Grub2 from Red Hat. It’s just I am from Red Hat and my colleague was able to confirm a version that has the problem fixed. The fix will likely take take some time to get into grub2 git and/or debian/ubuntu. Might not get fixed in the released versions.

Hi @lzap,

Thanks for your update!!
Looking forward to get a new version …

It’s nice to be able to provision all Linux distributions!!!

Best regards,

@Gueug78400