Tftp failure on foreman 3.8.0

brianudwa · August 15, 2024, 8:17pm

installing a fresh instance with the latest version sets ‘/var/lib/tftpboot’ to ownership of

foreman-proxy:root which is missing from the other system will try that while I do a parallel system

brianudwa · August 15, 2024, 10:35pm

grabbed a non used spare server installed forman without katello barebones install did basic gui setup added a host had it authenticate, had it create its host record in ipa, got an ipa from infoblox, verified in ‘/var/lib/tftpboot’ it created a pxe file , verfied all of the date on the pxe file is there, hit the console and tried to kick it and same thing is there any specific log or anything I can post to get more help with this?

brianudwa · August 16, 2024, 1:29am

ok found that config is actually working somewhat issue was actually that a network engineer was a little overworked and forgot to look for a ‘next-server’ description in there device, thx am moving on now this likely is closed I should be able to see if our legacy templates are not parsing causing the failure next

024-08-15T16:10:57 66558720 [I] Finished GET /unattended/templateServer with 200 (0.17 ms)
2024-08-15T16:10:58 66558720 [I] Started POST /tftp/PXEGrub/e0:4f:43:e6:7e:b7
2024-08-15T16:10:58 66558720 [I] Finished POST /tftp/PXEGrub/e0:4f:43:e6:7e:b7 with 200 (0.76 ms)
2024-08-15T16:10:58 66558720 [I] Started GET /unattended/templateServer
2024-08-15T16:10:58 66558720 [I] Finished GET /unattended/templateServer with 200 (0.22 ms)
2024-08-15T16:10:58 66558720 [I] Started GET /unattended/templateServer
2024-08-15T16:10:58 66558720 [I] Finished GET /unattended/templateServer with 200 (0.22 ms)
2024-08-15T16:10:58 66558720 [I] Started POST /tftp/iPXE/e0:4f:43:e6:7e:b7
2024-08-15T16:10:58 66558720 [I] Finished POST /tftp/iPXE/e0:4f:43:e6:7e:b7 with 200 (0.68 ms)
2024-08-15T16:10:58 66558720 [I] Started POST /tftp/fetch_boot_file
2024-08-15T16:10:58 66558720 [I] Finished POST /tftp/fetch_boot_file with 200 (0.68 ms)
2024-08-15T16:10:58 66558720 [I] [145619] Started task /usr/bin/curl\ –silent\ –show-error\ –connect-timeout\ 10\ –retry\ 3\ –retry-delay\ 10\ –max-time\ 3600\ –remote-time\ –time-cond\ /var/lib/tftpboot/boot/centostestv2-UPd1pzG0DVwJ-vmlinuz\ –write-out\ Task\ done,\ result:\ %{http_code},\ size\ downloaded:\ %{size_download}b,\ speed:\ %{speed_download}b/s,\ time:\ %{time_total}ms\ –output\ /var/lib/tftpboot/boot/centostestv2-UPd1pzG0DVwJ-vmlinuz\ –location\ fake.net steht zum Verkauf - Sedo GmbH
2024-08-15T16:10:58 66558720 [I] Started POST /tftp/fetch_boot_file
2024-08-15T16:10:58 66558720 [I] Finished POST /tftp/fetch_boot_file with 200 (0.82 ms)
2024-08-15T16:10:58 66558720 [I] [145622] Started task /usr/bin/curl\ –silent\ –show-error\ –connect-timeout\ 10\ –retry\ 3\ –retry-delay\ 10\ –max-time\ 3600\ –remote-time\ –time-cond\ /var/lib/tftpboot/boot/centostestv2-UPd1pzG0DVwJ-initrd.img\ –write-out\ Task\ done,\ result:\ %{http_code},\ size\ downloaded:\ %{size_download}b,\ speed:\ %{speed_download}b/s,\ time:\ %{time_total}ms\ –output\ /var/lib/tftpboot/boot/centostestv2-UPd1pzG0DVwJ-initrd.img\ –location\ http://centosmedia.gld.fake.net/CentOSTestv2/7.9/x86_64/images/pxeboot/initrd.img```

brianudwa · August 20, 2024, 1:07am

the logging even on debug is really telling me nothing so I went back to editing the tftp.service file again to get more debugging as to what is going on

Aug 19 18:04:56 pulp3 systemd[1]: Received notify message exceeded maximum size. Ignoring.
Aug 19 18:04:57 pulp3 in.tftpd[422152]: RRQ from ::ffff:100.110.25.26 filename boot/fdi-image/vmlinuz0
Aug 19 18:04:57 pulp3 in.tftpd[422152]: Client ::ffff:100.110.25.26 File not found boot/fdi-image/vmlinuz0
Aug 19 18:04:57 pulp3 systemd[1]: Received notify message exceeded maximum size. Ignoring.
Aug 19 18:05:01 pulp3 systemd[1]: message repeated 4 times: [Received notify message exceeded maximum size. Ignoring.]

so its looking for the discovery image but I dont know why and from what I understand that is not what I want( I could be wrong ). this is week 2 of a group not being able to work although I could enable discovery and that would perhaps resolve that I dont think we want discovery we want to provision specific nodes anyone able to comment on this?

brianudwa · August 20, 2024, 3:30pm

by changing the tftp service even though it was recommended I do not I am able to see that even in real hardware the request is partially going to the hardware but the hardware is not picking up the dhcp address from infoblox

i…e

Aug 20 08:27:25 pulp3 in.tftpd[444586]: RRQ from ::ffff:100.110.67.250 filename grub2/grubx64.efi
Aug 20 08:27:25 pulp3 systemd[1]: Received notify message exceeded maximum size. Ignoring.

^C
[root@pulp3 ~]# ping 100.110.67.250
PING 100.110.67.250 (100.110.67.250) 56(84) bytes of data.
^C
— 100.110.67.250 ping statistics —
42 packets transmitted, 0 received, 100% packet loss, time 41995ms

[root@pulp3 ~]#

since infoblox passes out dhcp I will recheck if the dhcp param in foreman 3.8.0 if set off still allows tftp, this might be the issue or the issue might be infoblox

brianudwa · August 20, 2024, 11:00pm

ok figured out my issue, only way was actually enabling allot more debugging, issue was certain vendor machines seem to be having a issue negotiating speed with a specific switch in our network. We tried a different vendor class ( older hence why we didn’t try it first) and it fired right up. the bad combo for now appears to be Lenovo and Arista. While HP had no issues.

brianudwa · August 29, 2024, 10:29pm

Figured out a work around in case anyone has a similar issue.

Hardware is Lenovo with Marvell 10g adapters, this solution allows the harware to even do grub2 uefi HTTP

setup a nfs volume on the foreman server export it locally to a EL7 pxe server
setup a cron that cp -var '/var/lib/tftpboot/grub2/grub.cfg-* exportdir

on a EL7 pxe server with xinetd & a mapfile i.e. -m /etc/tftpd.map

mount the foreman export dir & cron cp it evey 3 min

** reasoning ** this allows foreman to continue to manage the dir so nodes can kick and then go localboot when done.

** since we have infoblox as well setup a script for our team to toggle the next-server and bootp server each node gets from foreman to force hardware that is having issues with the new EL8 and higher tftp-server to a EL7 server. ( I also have a REDHAT Bug ticket open for this).
**

brianudwa · September 12, 2024, 6:06pm

Support ticket w Redhat acknowledged not sure when they will patch but bug is here

https://issues.redhat.com/browse/RHEL-58738

brianudwa · September 16, 2024, 6:36pm

Redhat provided a fix, release to prod which should trickle to other Distros should happen soon, this was a patch in EL7 missed porting to EL8 and higher

Original RFC