Foreman discovery image initial DHCP not working

Problem:

Using Foreman 1.24.2 (and an external DHCP server) when booting a machine using Foreman Discovery Image (3.5.7) the automatic network configuration that starts with a countdown fails to obtain a network configuration from DHCP. Upon failing, the message in the logs show two network devices, 1: lo: 127.0.0.1 and 2: ens192 which doesn’t obtain an IP.
Then:

  • DHCP LEASES *
    cat: /var/lib/NetworkManager/dhclient-*.lease: No such file or directory
  • DNS *
    cat: /etc/resolv.conf: No such file or directory

However, if I interrupt the automatic network configuration countdown (pressing any key), then choose to Discover with DHCP, select the only NIC option available (ens192), then the network is configured correctly and confirming the following screens about foreman server location etc all works and the facts are published to the foreman server. My understanding is that this is exactly what should be happening during the countdown stage anyway so I’m not sure what the difference is here - perhaps a race condition but extending the timeout doesn’t help…

Using (what I believe are) exactly the same settings on an existing Foreman 1.23.1 server (FDI 3.5.1) yields different results and automatic network configuration of the FDI image works first time without any manual intervention. I have tried copying the 3.5.1 FDI image to the 1.24.2 foreman server just in case and still no joy with that either.

Expected outcome:

Network configuration for DHCP shouldn’t require any manual intervention and the initial network configuration should work automatically.

Foreman and Proxy versions:

1.24.2

Foreman and Proxy plugin versions:

Discovery 1.0.5
TFTP: 1.24.2

Distribution and version:

CentOS 7 and Ubuntu 18.04

Other relevant data:

Using OOTB templates, no changes here.

Hello, I am not aware of any changes in this reagard. This must be an infrastructure issue I believe. I’ve recently fixed the fdi.timeout option since you reported on IRC that setting it explicitly does not work due to an oversight:

http://downloads.theforeman.org/discovery/nightly/

Can you increase the timeout to 120 seconds and try again with this build?

Can you compare DHCP server configuration on both networks?

Can you check via tcpdump/wireshark you are getting response from the correct server? (E.g. two DHCP servers on one network situation - we’ve been there.)

Also investigate and pastebin discovery journal, there must be a reason why network manager failed to bring the interface up.

Thanks @lzap

The timeout option didn’t seem to help.

The hosts are actually on the same network, so using the same DHCP server.

DHCP requests are being seen when interrupting the countdown network configuration and selecting to configure by DHCP. But the automatic countdown doesn’t generate a DHCP request unless is add fdi.initnet=all, which does initialise the network interface but it isn’t the primary NIC (secondary instead) so still fails.

So the difference being the BOOTIF isn’t being detected correctly.

Then I tried, hard-coding the BOOTIF MAC address at the kernel command line instead of relying on variable substitution of BOOTIF=01-$mac
It looks like variable substitution isn’t working correctly for some reason, although all the templates are default and left untouched :man_shrugging:

To clarify, hard-coding the BOOTIF value as a kernel option worked and the NIC was detected as the primary and reported back to foreman during the initial automatic network configuration countdown stage, so it’s related to templates (MAC variable substitution somehow)

Hello, thanks for getting back to me. Are you using PXELinux or Grub2? Keep in mind that only when you have “IPAPPEND 2” in PXELinux configuration, it will add the entry. Grub2 does not add this automatically.

Note that $mac variable contains colon-separated MAC address while BOOTIF should be separated by dashes. However our shell code does take this into account and converts that:

function normalizeHwAddr() {
  /usr/bin/tr 'A-F-' 'a-f:' <<< "${1}"
}

In that file (root/usr/share/fdi/commonfunc.sh) also the 01- prefix is stripped out.

What do you have in /etc/NetworkManager/system-connections/primary file as mac= option? This should be written by our nm-configure script which runs before NetworkManager starts up. It parses the kernel command line and normalize the MAC address.

Hmm that’s interesting…you’re correct that PXE works (I just tried that).

Grub2 used to substitute the MAC, at least in 1.23.1 as it works from my 1.23.1 server - I’ll compare the templates and see what’s different. Is there any reason Grub2 shouldn’t do MAC substitution?
My understanding is that Grub2 is required for EFI hosts…is this correct? Although I’m unclear how to automate the selection of which loader to use:

  • if the host is BIOS it needs to use pxelinux.0
  • if EFI then it needs to use grubx64.efi

Is this something usually handled/determined at the DHCP server?

Okay long story, this all started in:

Red Hat carries grub2 patch which makes them to load grub MAC-based config files. That ugly regexp hack converts MAC from : to - character.

Then we updated Foreman so it creates PXE configuration files with both characters, so the regexp could be removed:

This actually removed the variable that was introduced by that snippet, but I left one instance in the repo which was found and fixed by our community user:

It looks like you are suffering from this bug:

Since this is a second time a user is hitting this, can we backport this oneliner into a stable branch? @tbrisker it’s just the discovery template one liner in templates. It has no issue number tho :slight_smile: I can do a backport PR if you agree. Thanks.

Thanks @lzap, that $net_default_mac PR at the end fixes it

1 Like

Thanks, cherry-pick request: https://github.com/theforeman/community-templates/pull/692