Debugging provisioning

Hi,
during playing with network provisioning on my machine a put together list of valuable tips how to investigate and debug issues with network provisioning. It’s not a complete list in any way, and probably won’t fit every environment, but I hope some of you will find some tips useful and helpful.

Before provisioning

Minimum requirements
Having less RAM or disk space can lead to unexpected errors in any phase of provisioning. Be sure that your machine meets at least the minimum requirements for the OS.

Networking
Asking DHCP server for IP
DNS is working
Firewall is not blocking any connections
Can the host reach the Foreman?
Can the host reach the Smart Proxy?
Build token is not expired (404 error, bug?)

Foreman configuration
Debug logs enabled for easier investigation:

foreman-installer --foreman-logging-level "debug" --foreman-proxy-log-level "DEBUG"

Correctly assigned templates to OS
Supported boot loader for OS

Smart Proxy configuration
Required modules are enabled & configured properly
If the templates module is enabled, check the template_url

Plugins
Plugin is correctly configured
Smart proxy module for plugin is enabled and configured

Creating host in Foreman

Logs
When you create a host in Foreman, logs for host creation action have the same ID as logs in Smart Proxy for activities associated with that host.

Foreman:
2023-02-22T14:35:04 [I|app|3f57f995] Started POST "/hosts" for ::1 at 2023-02-22 14:35:04 +0100
2023-02-22T14:35:04 [I|app|3f57f995] Processing by HostsController#create as */*

Smart Proxy:
2023-02-22T14:35:04 3f57f995 [I] Started GET /tftp/serverName
2023-02-22T14:35:04 3f57f995 [I] Finished GET /tftp/serverName with 200 (0.24 ms)

With simple grep you can easily match logs & actions between Foreman and Smart Proxy.

Provisioning files
Be sure that <os>-initrd.img and <os>-vmlinuz files in /var/lib/tftpboot/boot/ are fully downloaded.
Sometimes provisioning starts before the files are fully downloaded and it can result in unexpected behavior and problems. You can use md5sum to verify checksums of files.

Check content of files (paths, URLs, menu options …)
/var/lib/tftpboot/pxelinux.cfg/<MAC>
/var/lib/tftpboot/pxelinux.cfg/<MAC>.ipxe
/var/lib/tftpboot/pxelinux.cfg/default
/var/lib/tftpboot/grub.cfg
/var/lib/tftpboot/grub.cfg/grub.cfg-<MAC>

After the host reboot

Was the built status callback to the Foreman successful?

grep "/unattended/built" /var/log/foreman/production.log
grep "<ID>" /var/log/foreman/production.log

Response code should be 201

Was the built status callback to Smart Proxy successful?

grep “/unattended/built” /var/log/foreman-proxy/proxy.log

Response code should be 200

Note: There is an unexpected behavior with invalid (expired) build tokens, where the endpoint returns a 404 on the Foreman side and 500 code on the Smart Proxy side. This is a know issue, the correct response code should be 401.

Files in /var/lib/tftpboot have been updated, host is not booting from the network again and again

Anaconda

/root/anaconda-ks.cfg KS file used by Anaconda
/root/original-ks.cfg Original generated by Foreman
/tmp/anaconda Logs when provisioning failed
/var/log/anaconda Logs when provisioning was successful

Watching network

Wire Shark is your friend.
sudo wireshark - select an interface and see what is going on there

If you can’t use Wire Shark, use tcpdump to capture and export output to pcap file, then you can open it in Wire Shark.

sudo tcpdump --list-interfaces
sudo tcpdump --interface <interface> -w output.pcap

Discovery

VM has at least 1200 MB of memory

  • SSH for PXE: pxelinux_discovery - APPEND fdi.ssh=1 fdi.rootpw=changeme
  • SSH PXEless: ./discovery-remaster <iso> "fdi.ssh=1 fdi.rootpw=changeme"<output-iso>
  • sudo mount -o loop remastered.iso /mnt/fdi to check the content of the image
  • discovery-debug - Useful script printing information about the host

Logs

  • journalctl --boot Logs from the current boot
  • journalctl --unit nm-prepare boot script which pre-configures Network Manager
  • journalctl --unit NetworkManager Networking information

That’s all folks

If you have anything that you think should be part of this checklist, please feel free to share. I’m also thinking to have it one day as a part of the official provisioning documentation, but that’s a story for another time.

5 Likes