Had a working installation of foreman (1.24.x, foreman discovery image 3.5.1, do not remember the version of foreman discovery plugin). With that version I discovered and provisioned 30/40 bare metal servers.
Upgraded to v2.0 (+ installed foreman datacenter plugin) a couple of months ago.
Tried to discover new bare metal servers in the last week.
Initially, discovery image was booting, but facts were not sent to foreman. Had this error: Validation failed: Name has already been taken
While trying to troubleshoot that error, I decided to delete /var/lib/tftpboot and let foreman recreate the directory on both foreman master AND foreman proxy servers (did this be re-running the foreman-installer command with the same parameters used during the installation/upgrade).
Since that moment, I was no more able to boot the discovery image. I see the Grub2 boot screen, select Foreman Discovery Image EFI, screen become black, and after a few seconds, this error appears: error: timeout reading āboot/fdi-image/initrd0.imgā. Press any key to continue. Boot process continues but then it fails.
On the smart proxy tftp server, I see this in the logs
Aug 10 12:13:28 s157p dhcpd: DHCPDISCOVER from 48:df:37:4d:f0:6c via enp70s0f0
Aug 10 12:13:29 s157p dhcpd: none: host unknown.
Aug 10 12:13:29 s157p dhcpd: DHCPOFFER on 192.168.240.254 to 48:df:37:4d:f0:6c via enp70s0f0
Aug 10 12:13:31 s157p dhcpd: DHCPREQUEST for 192.168.240.254 (192.168.240.157) from 48:df:37:4d:f0:6c via enp70s0f0
Aug 10 12:13:31 s157p dhcpd: DHCPACK on 192.168.240.254 to 48:df:37:4d:f0:6c via enp70s0f0
Aug 10 12:13:31 s157p in.tftpd[2499]: RRQ from 192.168.240.254 filename grub2/shim.efi
Aug 10 12:13:31 s157p in.tftpd[2499]: Error code 8: User aborted the transfer
Aug 10 12:13:31 s157p in.tftpd[2500]: RRQ from 192.168.240.254 filename grub2/shim.efi
Aug 10 12:13:31 s157p in.tftpd[2500]: Client 192.168.240.254 finished grub2/shim.efi
Aug 10 12:13:31 s157p in.tftpd[2501]: RRQ from 192.168.240.254 filename grub2/grubx64.efi
Aug 10 12:13:31 s157p in.tftpd[2501]: Client 192.168.240.254 finished grub2/grubx64.efi
Aug 10 12:13:32 s157p in.tftpd[2502]: RRQ from 192.168.240.254 filename grub2/grub.cfg-01-48-df-37-4d-f0-6c
Aug 10 12:13:32 s157p in.tftpd[2502]: Client 192.168.240.254 File not found grub2/grub.cfg-01-48-df-37-4d-f0-6c
Aug 10 12:13:32 s157p in.tftpd[2503]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8BBFF0FE
Aug 10 12:13:32 s157p in.tftpd[2503]: Client 192.168.240.254 File not found grub2/grub.cfg-8BBFF0FE
Aug 10 12:13:32 s157p in.tftpd[2504]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8BBFF0F
Aug 10 12:13:32 s157p in.tftpd[2504]: Client 192.168.240.254 File not found grub2/grub.cfg-8BBFF0F
Aug 10 12:13:32 s157p in.tftpd[2505]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8BBFF0
Aug 10 12:13:32 s157p in.tftpd[2505]: Client 192.168.240.254 File not found grub2/grub.cfg-8BBFF0
Aug 10 12:13:32 s157p in.tftpd[2506]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8BBFF
Aug 10 12:13:32 s157p in.tftpd[2506]: Client 192.168.240.254 File not found grub2/grub.cfg-8BBFF
Aug 10 12:13:32 s157p in.tftpd[2507]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8BBF
Aug 10 12:13:32 s157p in.tftpd[2507]: Client 192.168.240.254 File not found grub2/grub.cfg-8BBF
Aug 10 12:13:32 s157p in.tftpd[2508]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8BB
Aug 10 12:13:32 s157p in.tftpd[2508]: Client 192.168.240.254 File not found grub2/grub.cfg-8BB
Aug 10 12:13:32 s157p in.tftpd[2509]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8B
Aug 10 12:13:32 s157p in.tftpd[2509]: Client 192.168.240.254 File not found grub2/grub.cfg-8B
Aug 10 12:13:32 s157p in.tftpd[2510]: RRQ from 192.168.240.254 filename grub2/grub.cfg-8
Aug 10 12:13:32 s157p in.tftpd[2510]: Client 192.168.240.254 File not found grub2/grub.cfg-8
Aug 10 12:13:32 s157p in.tftpd[2511]: RRQ from 192.168.240.254 filename grub2/grub.cfg
Aug 10 12:13:32 s157p in.tftpd[2511]: Client 192.168.240.254 finished grub2/grub.cfg
Aug 10 12:13:32 s157p in.tftpd[2512]: RRQ from 192.168.240.254 filename /EFI/centos/x86_64-efi/command.lst
Aug 10 12:13:32 s157p in.tftpd[2512]: Client 192.168.240.254 File not found /EFI/centos/x86_64-efi/command.lst
Aug 10 12:13:32 s157p in.tftpd[2513]: RRQ from 192.168.240.254 filename /EFI/centos/x86_64-efi/fs.lst
Aug 10 12:13:32 s157p in.tftpd[2513]: Client 192.168.240.254 File not found /EFI/centos/x86_64-efi/fs.lst
Aug 10 12:13:32 s157p in.tftpd[2514]: RRQ from 192.168.240.254 filename /EFI/centos/x86_64-efi/crypto.lst
Aug 10 12:13:32 s157p in.tftpd[2514]: Client 192.168.240.254 File not found /EFI/centos/x86_64-efi/crypto.lst
Aug 10 12:13:32 s157p in.tftpd[2515]: RRQ from 192.168.240.254 filename /EFI/centos/x86_64-efi/terminal.lst
Aug 10 12:13:32 s157p in.tftpd[2515]: Client 192.168.240.254 File not found /EFI/centos/x86_64-efi/terminal.lst
Aug 10 12:13:32 s157p in.tftpd[2516]: RRQ from 192.168.240.254 filename grub2/grub.cfg
Aug 10 12:13:32 s157p in.tftpd[2516]: Client 192.168.240.254 finished grub2/grub.cfg
Aug 10 12:13:32 s157p in.tftpd[2517]: RRQ from 192.168.240.254 filename /httpboot/grub2/grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2517]: Client 192.168.240.254 File not found /httpboot/grub2/grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2518]: RRQ from 192.168.240.254 filename /grub2/grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2518]: Client 192.168.240.254 File not found /grub2/grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2519]: RRQ from 192.168.240.254 filename grub2/grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2519]: Client 192.168.240.254 File not found grub2/grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2520]: RRQ from 192.168.240.254 filename grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:32 s157p in.tftpd[2520]: Client 192.168.240.254 File not found grub.cfg-48:df:37:4d:f0:6c
Aug 10 12:13:39 s157p in.tftpd[2521]: RRQ from 192.168.240.254 filename boot/fdi-image/vmlinuz0
Aug 10 12:13:39 s157p in.tftpd[2521]: Client 192.168.240.254 finished boot/fdi-image/vmlinuz0
Aug 10 12:13:39 s157p in.tftpd[2522]: RRQ from 192.168.240.254 filename boot/fdi-image/initrd0.img
Aug 10 12:14:16 s157p in.tftpd[2522]: Client 192.168.240.254 finished boot/fdi-image/initrd0.img
The timeout error appears when the message Client 192.168.240.254 finished boot/fdi-image/initrd0.img is written in the server log.
In addition, I have tried the following things
tried to upgrade foreman/smart-proxy to v2.1.1. Nothing changed
I made sure TFTP works, by booting a live cd on the server to discover, and after it was loaded, tried to retrieve the image from the smart proxy using TFTP. The download was working.
I tried to Network boot using a different port of the same NIC. Nothing changed
I tried to discover another new server. Same timeout issue.
What could be the issue? Any suggestion? Many thanks in advance
Expected outcome: Discovery image boot
Foreman and Proxy versions:
Foreman master 2.1.1
Foreman proxy (in a different subnet): 2.1.1
this usually indicates there is a firewall misconfiguration. TFTP is a stateless protocol, it cannot go through NAT, itās UDP. But in your case you can download small files but the initramdisk of discovery which is 300 MB fails - this indicates network problem. UDP packets gets lost.
Ditch TFTP for once and forever and start using HTTP UEFI boot if you can. We have added support in 2.1. If you still need BIOS, then what you can do iPXE chainbooting.
Iām experiencing exactly the same issue after upgrading foreman 1.24.x to 2.0 on Fujitsu hardware.
So far I was only able to install 1 server out of the 4 I tried, and have another 8 to go.
I am wondering if you found the reason and what you applied as solution?
Weird, we havenāt changed anything in this regard. Can you compare TFTP server versions before/after the upgrade? But I doubt there were any changes in this ancient package.
I did not manage to find the root reason of why the discovery image stopped working.
I tried, as suggested from @lzap , to use UEFI http boot, but without success.
We have many HP DL380 Gen9 e Gen10. I tried to use the āBoot from URLā feature, by using an URL like http://IP_SMART_PROXY/httpboot/grub2/grubx64.efi.
This returns a grub> terminal with
Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists possible device or file completions.
httpboot was enabled on both master and smart-proxy. I had to redirect with a firewall rule port 8000 to 80.
As a reference, I followed (and found interesting) the following guides/tutorial:
My workaround (for now ) was to use the āCreate Hostā option in foreman, manually type all the MAC addresses of the server, configure OS/partitioning and then directly run the provisioning. But this is a bit annoying, since the discovery saves a lot of time / manual work.
For the record, Red Hat QA have identified a regression in (likely) a grub2 update in RHEL7. We are currently investigating the problem. Suggestion is to either downgrade or upgrade to Fedora Rawhide grub, until we find out what patch caused this.
Edit 2: RH bootloader team identified the regression, it was caused by one of the CVE fixes in grub2. Here is a WIP patch that includes all security patches as well as correction of the TFTP sizing problem:
I had also one server that resulted constantly in the GRUB-prompt
Only the first 1GB network interface for UEFI PXE was configured as boot option.
By configuring UEFI HTTP Boot as extra boot option for the same interface, but as first option and as second one the PXE I got rid of the GRUB-prompt and got the expected menu to select the desired Foreman Discovery Image option.
This was the case for my Fujitsu RX-2530-M5 System, no idea how it is for HP.
Beside this, we also have HP Apollo 4200 Gen 9 and HP DL-360 Gen9 & 10 servers.
But were installed in the past with cobbler. The idea is to build them from scratch in the future with Foreman.
I also tried to use the grubx64.efi from Fedora Rawhide, but that did not solved our problem as well.
tcpdump from our Foreman-server rsults in connection loss as of packet 66678 for the initrd0.img file.
So, I will try the workaround by adding/configuring the host manually in Foreman and building the provisioning file. Hope this might be a temporary solution fo rme as well.
Anyway, thnx for your feedback.
I hope they will find and provide soon a solution for this issue.
Thanks for this feedback. This is wonderful news and I will apply these actions as first job, next Monday.
I also had the āFAILUREā ststus during the first discovery in version 1.24.x of Foreman.
As a solution I applied āResendā and it then went well.
Give it a try, hope it will work for you as well.
@lzap
I have configured one organization with only one location and one domain.
There are 2 subnets, no VLANs, no network restrictions.
In the first subnet there is the foreman master with one smart proxy.
In the second subnet just the smart proxy (this was needed for the dhcp of the discovered hosts).
Host is discovered in 2nd subnet, facts are sent directly to foreman master in 1st subnet.