Hi,
during playing with network provisioning on my machine a put together list of valuable tips how to investigate and debug issues with network provisioning. It’s not a complete list in any way, and probably won’t fit every environment, but I hope some of you will find some tips useful and helpful.
Before provisioning
Minimum requirements
Having less RAM or disk space can lead to unexpected errors in any phase of provisioning. Be sure that your machine meets at least the minimum requirements for the OS.
Networking
Asking DHCP server for IP
DNS is working
Firewall is not blocking any connections
Can the host reach the Foreman?
Can the host reach the Smart Proxy?
Build token is not expired (404 error, bug?)
Foreman configuration
Debug logs enabled for easier investigation:
foreman-installer --foreman-logging-level "debug" --foreman-proxy-log-level "DEBUG"
Correctly assigned templates to OS
Supported boot loader for OS
Smart Proxy configuration
Required modules are enabled & configured properly
If the templates module is enabled, check the template_url
Plugins
Plugin is correctly configured
Smart proxy module for plugin is enabled and configured
Creating host in Foreman
Logs
When you create a host in Foreman, logs for host creation action have the same ID as logs in Smart Proxy for activities associated with that host.
Foreman:
2023-02-22T14:35:04 [I|app|3f57f995] Started POST "/hosts" for ::1 at 2023-02-22 14:35:04 +0100
2023-02-22T14:35:04 [I|app|3f57f995] Processing by HostsController#create as */*
Smart Proxy:
2023-02-22T14:35:04 3f57f995 [I] Started GET /tftp/serverName
2023-02-22T14:35:04 3f57f995 [I] Finished GET /tftp/serverName with 200 (0.24 ms)
With simple grep you can easily match logs & actions between Foreman and Smart Proxy.
Provisioning files
Be sure that <os>-initrd.img and <os>-vmlinuz files in /var/lib/tftpboot/boot/ are fully downloaded.
Sometimes provisioning starts before the files are fully downloaded and it can result in unexpected behavior and problems. You can use md5sum to verify checksums of files.
Check content of files (paths, URLs, menu options …)
/var/lib/tftpboot/pxelinux.cfg/<MAC>
/var/lib/tftpboot/pxelinux.cfg/<MAC>.ipxe
/var/lib/tftpboot/pxelinux.cfg/default
/var/lib/tftpboot/grub.cfg
/var/lib/tftpboot/grub.cfg/grub.cfg-<MAC>
After the host reboot
Was the built status callback to the Foreman successful?
grep "/unattended/built" /var/log/foreman/production.log
grep "<ID>" /var/log/foreman/production.log
Response code should be 201
Was the built status callback to Smart Proxy successful?
grep “/unattended/built” /var/log/foreman-proxy/proxy.log
Response code should be 200
Note: There is an unexpected behavior with invalid (expired) build tokens, where the endpoint returns a 404 on the Foreman side and 500 code on the Smart Proxy side. This is a know issue, the correct response code should be 401.
Files in /var/lib/tftpboot have been updated, host is not booting from the network again and again
Anaconda
/root/anaconda-ks.cfg KS file used by Anaconda
/root/original-ks.cfg Original generated by Foreman
/tmp/anaconda Logs when provisioning failed
/var/log/anaconda Logs when provisioning was successful
Watching network
Wire Shark is your friend.
sudo wireshark - select an interface and see what is going on there
If you can’t use Wire Shark, use tcpdump to capture and export output to pcap file, then you can open it in Wire Shark.
sudo tcpdump --list-interfaces
sudo tcpdump --interface <interface> -w output.pcap
Discovery
VM has at least 1200 MB of memory
- SSH for PXE:
pxelinux_discovery-APPEND fdi.ssh=1 fdi.rootpw=changeme - SSH PXEless:
./discovery-remaster <iso> "fdi.ssh=1 fdi.rootpw=changeme"<output-iso> -
sudo mount -o loop remastered.iso /mnt/fdito check the content of the image -
discovery-debug- Useful script printing information about the host
Logs
-
journalctl --bootLogs from the current boot -
journalctl --unit nm-prepareboot script which pre-configures Network Manager -
journalctl --unit NetworkManagerNetworking information
That’s all folks
If you have anything that you think should be part of this checklist, please feel free to share. I’m also thinking to have it one day as a part of the official provisioning documentation, but that’s a story for another time.