Hi,
during playing with network provisioning on my machine a put together list of valuable tips how to investigate and debug issues with network provisioning. It’s not a complete list in any way, and probably won’t fit every environment, but I hope some of you will find some tips useful and helpful.
Before provisioning
Minimum requirements
Having less RAM or disk space can lead to unexpected errors in any phase of provisioning. Be sure that your machine meets at least the minimum requirements for the OS.
Networking
Asking DHCP server for IP
DNS is working
Firewall is not blocking any connections
Can the host reach the Foreman?
Can the host reach the Smart Proxy?
Build token is not expired (404 error, bug?)
Foreman configuration
Debug logs enabled for easier investigation:
foreman-installer --foreman-logging-level "debug" --foreman-proxy-log-level "DEBUG"
Correctly assigned templates to OS
Supported boot loader for OS
Smart Proxy configuration
Required modules are enabled & configured properly
If the templates
module is enabled, check the template_url
Plugins
Plugin is correctly configured
Smart proxy module for plugin is enabled and configured
Creating host in Foreman
Logs
When you create a host in Foreman, logs for host creation action have the same ID as logs in Smart Proxy for activities associated with that host.
Foreman:
2023-02-22T14:35:04 [I|app|3f57f995] Started POST "/hosts" for ::1 at 2023-02-22 14:35:04 +0100
2023-02-22T14:35:04 [I|app|3f57f995] Processing by HostsController#create as */*
Smart Proxy:
2023-02-22T14:35:04 3f57f995 [I] Started GET /tftp/serverName
2023-02-22T14:35:04 3f57f995 [I] Finished GET /tftp/serverName with 200 (0.24 ms)
With simple grep
you can easily match logs & actions between Foreman and Smart Proxy.
Provisioning files
Be sure that <os>-initrd.img
and <os>-vmlinuz
files in /var/lib/tftpboot/boot/
are fully downloaded.
Sometimes provisioning starts before the files are fully downloaded and it can result in unexpected behavior and problems. You can use md5sum
to verify checksums
of files.
Check content of files (paths, URLs, menu options …)
/var/lib/tftpboot/pxelinux.cfg/<MAC>
/var/lib/tftpboot/pxelinux.cfg/<MAC>.ipxe
/var/lib/tftpboot/pxelinux.cfg/default
/var/lib/tftpboot/grub.cfg
/var/lib/tftpboot/grub.cfg/grub.cfg-<MAC>
After the host reboot
Was the built status callback to the Foreman successful?
grep "/unattended/built" /var/log/foreman/production.log
grep "<ID>" /var/log/foreman/production.log
Response code should be 201
Was the built status callback to Smart Proxy successful?
grep “/unattended/built” /var/log/foreman-proxy/proxy.log
Response code should be 200
Note: There is an unexpected behavior with invalid (expired) build tokens, where the endpoint returns a 404
on the Foreman side and 500
code on the Smart Proxy side. This is a know issue, the correct response code should be 401
.
Files in /var/lib/tftpboot
have been updated, host is not booting from the network again and again
Anaconda
/root/anaconda-ks.cfg KS file used by Anaconda
/root/original-ks.cfg Original generated by Foreman
/tmp/anaconda Logs when provisioning failed
/var/log/anaconda Logs when provisioning was successful
Watching network
Wire Shark is your friend.
sudo wireshark
- select an interface and see what is going on there
If you can’t use Wire Shark, use tcpdump
to capture and export output to pcap
file, then you can open it in Wire Shark.
sudo tcpdump --list-interfaces
sudo tcpdump --interface <interface> -w output.pcap
Discovery
VM has at least 1200 MB of memory
- SSH for PXE:
pxelinux_discovery
-APPEND fdi.ssh=1 fdi.rootpw=changeme
- SSH PXEless:
./discovery-remaster <iso> "fdi.ssh=1 fdi.rootpw=changeme"<output-iso>
-
sudo mount -o loop remastered.iso /mnt/fdi
to check the content of the image -
discovery-debug
- Useful script printing information about the host
Logs
-
journalctl --boot
Logs from the current boot -
journalctl --unit nm-prepare
boot script which pre-configures Network Manager -
journalctl --unit NetworkManager
Networking information
That’s all folks
If you have anything that you think should be part of this checklist, please feel free to share. I’m also thinking to have it one day as a part of the official provisioning documentation, but that’s a story for another time.