Random VM fails to download pxelinux.0 over TFTP

Problem:
I have an issue where 3 out of 4 VMs provision successfully. The VM that fails to provision times out when attempting to pull the pxelinux.0 file during bootup from the Foreman (foreman-proxy) server.
Expected outcome:
Successful transfer of the pxelinux.0 file to the newly created VM.
Foreman and Proxy versions:
Foreman: 3.1.2
Foreman-Proxy: 3.1.2
Foreman and Proxy plugin versions:
No plugins are installed.
Distribution and version:
Rocky 8.5
Other relevant data:
If I run tcpdump on the Foreman server and then power cycle the VM I can see that the DHCP portion of the process is successful. I can also see that the foreman server receives a packet from the client on port 69 requesting the pxelinux.0 file. But nothing is sent in response to the client.

I can also confirm that tftp.service and tftp.socket are running and listening.

I can GET the pxelinux.0 file from the Foreman server using a tftp client utility on any other client machine on the network. So the tftp service on the foreman server appears to be working but it’s just ignoring my new VM that I just kicked off with Foreman.

I’m not sure what else to try to troubleshoot TFTP.

Further googling brought me to this post on serverfault.com: networking - TFTP Server working, PXE Boot is not - Server Fault

I’m seeing the exact same symptoms. It appears the new VM that I’m trying to provision is sending the tftp read request packet with the exact same way that the machine in the linked thread is above.

How do I ensure that the newly provisioned VM requests the file using netascii rather than octet?

This is on a Libvirt hypervisor. Do I need to do something to change the VM’s PXE ROM?

New development. I just provisioned 4 VMs one after another. The 1st and 4th VMs failed to provision (failed to pull the pxelinux.0 file over tftp), while the 2nd and 3rd succeeded.

The VMs were issued the following IP addresses from the DHCP server on their provision interfaces:

  1. 100.99.97.35/24
  2. 100.99.97.49/24
  3. 100.99.97.43/24
  4. 100.99.97.44/24

It looks like they all sent their TFTP read request using a tftp mode of “octet”. So, the cause of the problem doesn’t appear to be related to the tftp mode.

It looks like the the two VMs that failed are sending their tftp read request to the destination address of the other interface on my Foreman server.

The Foreman server has two network interfaces. One, the main one, is IP’d on 100.99.97.7/24. This should be the interface that provisions the VMs. The second interface, this one is on my storage VLAN, and is IP’d on 100.99.98.7/24.

I can see from the below tcpdump output that the VMs that succeed send their tftp read request to the correct destination IP address on 100.99.97.7. While the VMs that fail send their tftp read request to the incorrect destination IP address on 100.99.98.7.

I suspect this could possibly be a DNS or routing issue in my environment.

Here’s the tcpdump of the events:

tcpdump -i any -nntl port 69
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
IP 100.99.97.35.55174 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.35.55174 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.35.55174 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.35.55174 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.35.55174 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.35.55174 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.49.29345 > 100.99.97.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.49.49152 > 100.99.97.7.69:  41 RRQ "ldlinux.c32" octet tsize 0 blksize 1408
IP 100.99.97.49.49153 > 100.99.97.7.69:  79 RRQ "pxelinux.cfg/3c12b6ea-a2a0-db4f-8353-e502936cda30" octet tsize 0 blksize 1408
IP 100.99.97.49.49154 > 100.99.97.7.69:  63 RRQ "pxelinux.cfg/01-52-54-00-74-c1-2b" octet tsize 0 blksize 1408
IP 100.99.97.49.49155 > 100.99.97.7.69:  38 RRQ "menu.c32" octet tsize 0 blksize 1408
IP 100.99.97.49.49156 > 100.99.97.7.69:  38 RRQ "menu.c32" octet tsize 0 blksize 1408
IP 100.99.97.49.49157 > 100.99.97.7.69:  41 RRQ "libutil.c32" octet tsize 0 blksize 1408
IP 100.99.97.49.49158 > 100.99.97.7.69:  63 RRQ "pxelinux.cfg/01-52-54-00-74-c1-2b" octet tsize 0 blksize 1408
IP 100.99.97.49.49159 > 100.99.97.7.69:  64 RRQ "boot/rocky8-5-Zl1UrKUuwIid-vmlinuz" octet tsize 0 blksize 1408
IP 100.99.97.49.49160 > 100.99.97.7.69:  67 RRQ "boot/rocky8-5-Zl1UrKUuwIid-initrd.img" octet tsize 0 blksize 1408
IP 100.99.97.43.56097 > 100.99.97.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.43.49152 > 100.99.97.7.69:  41 RRQ "ldlinux.c32" octet tsize 0 blksize 1408
IP 100.99.97.43.49153 > 100.99.97.7.69:  79 RRQ "pxelinux.cfg/d1f08e86-8f38-5d4d-b5b7-ed62c376621a" octet tsize 0 blksize 1408
IP 100.99.97.43.49154 > 100.99.97.7.69:  63 RRQ "pxelinux.cfg/01-52-54-00-f4-0c-c5" octet tsize 0 blksize 1408
IP 100.99.97.43.49155 > 100.99.97.7.69:  38 RRQ "menu.c32" octet tsize 0 blksize 1408
IP 100.99.97.43.49156 > 100.99.97.7.69:  38 RRQ "menu.c32" octet tsize 0 blksize 1408
IP 100.99.97.43.49157 > 100.99.97.7.69:  41 RRQ "libutil.c32" octet tsize 0 blksize 1408
IP 100.99.97.43.49158 > 100.99.97.7.69:  63 RRQ "pxelinux.cfg/01-52-54-00-f4-0c-c5" octet tsize 0 blksize 1408
IP 100.99.97.43.49159 > 100.99.97.7.69:  64 RRQ "boot/rocky8-5-Zl1UrKUuwIid-vmlinuz" octet tsize 0 blksize 1408
IP 100.99.97.44.2880 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.43.49160 > 100.99.97.7.69:  67 RRQ "boot/rocky8-5-Zl1UrKUuwIid-initrd.img" octet tsize 0 blksize 1408
IP 100.99.97.44.2880 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.44.2880 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.44.2880 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.44.2880 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP 100.99.97.44.2880 > 100.99.98.7.69:  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0

More details. Once again looking at tcpdump but this time only for a VM that’s failing to provision.

I can see that the DHCP server is sending the “Server-IP 100.99.98.7”. So that’s why the client is attempting to pull it’s file using that destination address.

Is there a way to ensure that my DHCP server always sends the “Server-IP 100.99.97.7” option?

My /etc/dhcp/dhcpd.conf file has the line: “next-server 100.99.97.7;”
It appears that this parameter is being ignored.

Once again here’s the output of the tcpdump:

tcpdump -i any -nntl host 100.99.97.35 -vvv
dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 329)
    100.99.97.7.67 > 100.99.97.35.68: [udp sum ok] BOOTP/DHCP, Reply, length 301, xid 0x78373379, secs 4, Flags [none] (0x0000)
	  Your-IP 100.99.97.35
	  Server-IP 100.99.98.7
	  Client-Ethernet-Address 52:54:00:52:7e:31
	  file "pxelinux.0"
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 100.99.97.7
	    Lease-Time Option 51, length 4: 43200
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    Default-Gateway Option 3, length 4: 100.99.97.1
	    Domain-Name-Server Option 6, length 8: 100.99.97.4,100.99.97.5
	    Hostname Option 12, length 12: "test1.jnk.sys"
	    Domain-Name Option 15, length 7: "jnk.sys"
	    END Option 255, length 0
IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 329)
    100.99.97.7.67 > 100.99.97.35.68: [udp sum ok] BOOTP/DHCP, Reply, length 301, xid 0x78373379, secs 10, Flags [none] (0x0000)
	  Your-IP 100.99.97.35
	  Server-IP 100.99.98.7
	  Client-Ethernet-Address 52:54:00:52:7e:31
	  file "pxelinux.0"
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 100.99.97.7
	    Lease-Time Option 51, length 4: 43200
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    Default-Gateway Option 3, length 4: 100.99.97.1
	    Domain-Name-Server Option 6, length 8: 100.99.97.4,100.99.97.5
	    Hostname Option 12, length 12: "test1.jnk.sys"
	    Domain-Name Option 15, length 7: "jnk.sys"
	    END Option 255, length 0
IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 329)
    100.99.97.7.67 > 100.99.97.35.68: [udp sum ok] BOOTP/DHCP, Reply, length 301, xid 0x78373379, secs 18, Flags [none] (0x0000)
	  Your-IP 100.99.97.35
	  Server-IP 100.99.98.7
	  Client-Ethernet-Address 52:54:00:52:7e:31
	  file "pxelinux.0"
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: ACK
	    Server-ID Option 54, length 4: 100.99.97.7
	    Lease-Time Option 51, length 4: 43200
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    Default-Gateway Option 3, length 4: 100.99.97.1
	    Domain-Name-Server Option 6, length 8: 100.99.97.4,100.99.97.5
	    Hostname Option 12, length 12: "test1.jnk.sys"
	    Domain-Name Option 15, length 7: "jnk.sys"
	    END Option 255, length 0
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.99.97.35 tell 100.99.97.35, length 28
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.99.97.1 tell 100.99.97.35, length 28
IP (tos 0x0, ttl 63, id 1049, offset 0, flags [none], proto UDP (17), length 68)
    100.99.97.35.10896 > 100.99.98.7.69: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP (tos 0x0, ttl 63, id 1306, offset 0, flags [none], proto UDP (17), length 68)
    100.99.97.35.10896 > 100.99.98.7.69: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP (tos 0x0, ttl 63, id 1563, offset 0, flags [none], proto UDP (17), length 68)
    100.99.97.35.10896 > 100.99.98.7.69: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP (tos 0x0, ttl 63, id 1820, offset 0, flags [none], proto UDP (17), length 68)
    100.99.97.35.10896 > 100.99.98.7.69: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP (tos 0x0, ttl 63, id 2078, offset 0, flags [none], proto UDP (17), length 68)
    100.99.97.35.10896 > 100.99.98.7.69: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0
IP (tos 0x0, ttl 63, id 2321, offset 0, flags [none], proto UDP (17), length 68)
    100.99.97.35.10896 > 100.99.98.7.69: [udp sum ok]  40 RRQ "pxelinux.0" octet blksize 1432 tsize 0

I’m getting closer and closer to solving this.

I found this foreman thread and it seems to indicate what I need to do. I’m just not sure how to do it.

It seems to indicate that I need to set the tftp_servername setting as reported by the proxy API. It also says that it can be either a hostname or an IP address. I’d like to make it an IP address so that it doesn’t rely on DNS name resolution.

Can anyone tell me how to define that setting? I don’t see it anywhere in the Foreman WebUI.

Thanks!

I’m so close!

I’ve found the Smart Proxy API documenation here: API - Smart Proxy - Foreman

I can glean the value of the tftp_serverName property with the following:

curl -k -u admin:XXXXX  -H "Accept: version=2,application/json" https://foreman.jnk.sys:8443/tftp/serverName --cacert /etc/foreman/proxy_ca.pem --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem
{"serverName":""}

The property is not set, so I’d like to set it to 100.99.97.7. I’ve tried the following but it’s failing:

curl -k -u admin:XXXXX  -H "Accept: version=2,application/json" -X PUT -d '{"serverName": "100.99.97.7"}' https://foreman.jnk.sys:8443/tftp/serverName --cacert /etc/foreman/proxy_ca.pem --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem
Requested url was not found

Is it possible that this property is “read only” and can’t be written to via the API?

Well it’s fixed now. The fix was really to just get DNS working correctly such that the foreman DNS record was always returned correctly and consistently. I did not find a way to set the tftp_serverName property via the proxy API.