VM systems do not boot and I can't find the cause

crashdog · January 31, 2019, 7:10pm

Problem:
As stated in the subject. I have setup several virtual machines. One hyper-v Server 2016 Gen 1, one Gen2 and a VMWare player system. None if them do boot with different behavours.
Expected outcome:
VMs should boot
Foreman and Proxy versions:
1.20.1
Foreman and Proxy plugin versions:
not sure
Other relevant data:
OK, let me explain a bit more in prosa. I have two HP Proliant DL380 Gen8 Servers that I use for educational purposes. At the moment I’m trying to learn “unatendend provisioning”. Until now I’ve always installed the VMs using iso images (dvd). I’ve setup Foreman according to the installation guide. I’ve read quite a bit in the documentation.

/etc/xinetd.d/tftp is running over xinet.d config:

service tftp
{
        socket_type             = dgram
        protocol                = udp
        wait                    = yes
        user                    = root
        server                  = /usr/sbin/in.tftpd
        server_args             = -vvv -s /var/lib/tftpboot/ -m /etc/tftpd.map
        disable                 = yes
        per_source              = 11
        cps                     = 100 2
        flags                   = IPv4
}

I installed isc dhcpd and the /etc/dhcp/dhcpd.conf currently looks like this:

 DHCPDARGS="eth0";
allow booting;
allow bootp;

option domain-name-servers 192.168.178.41;

subnet 192.168.178.0 netmask 255.255.255.0 {
        option subnet-mask 255.255.255.0;
        option routers 192.168.178.1;
}

host satellite.md80.ch {
 hardware ethernet 00:15:5d:ee:f9:03;
 fixed-address 192.168.178.86;
 ddns-hostname "satellite.md80.ch";
}

host svmqhub3.md80.ch {
 hardware ethernet 00:15:5d:01:27:0d;
 fixed-address 192.168.178.88;
 next-server 192.168.178.86;
 filename "pxelinux.0";
}

host testsystem1.md80.ch {
 hardware ethernet 00:0C:29:9E:E7:40;
 fixed-address 192.168.178.89;
 next-server 192.168.178.86;
 filename "pxelinux.0";
}

I disabled firewalld and selinux for now. There is no hardware firewall in the LAN.
logs
All I really see when booting the GEN 2 vm (svmqhub3.md80.ch) is in the /var/log/messages

Jan 31 19:28:59 satellite dhcpd: DHCPDISCOVER from 00:15:5d:01:27:0d via eth0
Jan 31 19:28:59 satellite dhcpd: DHCPOFFER on 192.168.178.88 to 00:15:5d:01:27:0d via eth0
Jan 31 19:29:03 satellite dhcpd: DHCPREQUEST for 192.168.178.88 (192.168.178.86) from 00:15:5d:01:27:0d via eth0
Jan 31 19:29:03 satellite dhcpd: DHCPACK on 192.168.178.88 to 00:15:5d:01:27:0d via eth0
Jan 31 19:29:03 satellite systemd: Started Tftp Server.
Jan 31 19:29:03 satellite systemd: Starting Tftp Server...
Jan 31 19:29:03 satellite in.tftpd[55265]: Error code 8: User aborted the transfer
Jan 31 19:29:03 satellite in.tftpd[55266]: Client ::ffff:192.168.178.88 finished pxelinux.0

When I start the Gen1 server or the VMWare player system nothing is entered in the /var/log/message log. On the hyper-v console of the Gen 2 System I only see “There was an unexpected network error”.

I’ve been googling “Error code 8: User aborted the transfer” and other tfpt dhcpd related topics for days now without any luck.

I can iso boot the GEN 2 VM and change the ks to my satellite.md80.ch… config and it will then install as expected. But the boot part just won’t work.

crashdog · January 31, 2019, 9:50pm

ok, replying to myself here…appears that I can’t modify my first post for some reason.
I followed this tutorial https://www.unixmen.com/install-pxe-server-and-configure-pxe-client-on-centos-7/
Except for the dhcp configuration that didn’t work from the example. I changed it to:

 DHCPDARGS="eth0";
allow booting;
allow bootp;
option option-128 code 128 = string;
option option-129 code 129 = text;

subnet 192.168.178.0 netmask 255.255.255.0 {
        option subnet-mask 255.255.255.0;
        option routers 192.168.178.1;
}

host testsystem1.md80.ch {
 hardware ethernet 08:00:27:18:44:79;
 fixed-address 192.168.178.89;
 next-server 192.168.178.86;
 filename "pxelinux.0";
}

host svmqhub3.md80.ch {
 hardware ethernet 00:15:5d:01:27:0d;
 fixed-address 192.168.178.88;
 next-server 192.168.178.86;
 filename "pxelinux.0";
}

The rest worked. Although so far only with a virtualbox VM running on my main PC. Hyper-v still reports same issue. I then changed tftp to point to my “foreman tftp” directory and the virtualbox vm pxe boots and installes the system that I created in Foreman. I guess that I have to continue on the tftp side to findout why the Hyper-v VMs don’t boot. I did find a bugreport on tfpt 2.21 so I tried downgrading to 2.13. But that just gave me a different error. “tftp: client does not accept options”. So back to previously installed 5.22.

thinkitdata · January 31, 2019, 9:57pm

Hey man, I am actually building something very similar. I’m using Hyper-V and running Foreman/Katello as well as Foreman-discovery and remote-command-execution. One thing different for me is that I’m running BIND and DHCPD as docker containers (host is a server running Win10Pro). I should have everything configured over the next 2 days and I’ll share my configs w/you if you’re still having probs.

crashdog · January 31, 2019, 10:07pm

Hello,
greatly appreciated. I just checked my virtual switch. Since I wasn’t 100% sure if it’s not a NAT. But it appears to be external. Also, if I boot off an iso and change the ks to point to the Foreman URL it’ll install fine.

Cheers

crashdog · February 1, 2019, 7:42am

I got one step further on this. Finding following article https://docs.oracle.com/cd/E52668_01/E54695/html/ol7-install-pxe-dhcp-tftp.html
I altered my /etc/dhcp/dhcpd.conf as follows:

DHCPDARGS="eth0";
allow booting;
allow bootp;
set vendorclass = option vendor-class-identifier;
option pxe-system-type code 93 = unsigned integer 16;
set pxetype = option pxe-system-type;

option domain-name "md80.ch";

#option option-128 code 128 = string;
#option option-129 code 129 = text;
#option tftp-server-name "192.168.178.86";
#option bootfile-name "/var/lib/tftpboot/pxelinux.0";
#option domain-name-servers 192.168.178.41, 192.168.178.1;

subnet 192.168.178.0 netmask 255.255.255.0 {
        option subnet-mask 255.255.255.0;
        option domain-name-servers 192.168.178.41;
        option routers 192.168.178.1;
       default-lease-time 14400;
        max-lease-time 28800;
  if substring(vendorclass, 0, 9)="PXEClient" {
    if pxetype=00:06 or pxetype=00:07 {
        filename "grub2/grubx64.efi";
    } else {
        filename "pxelinux.0";
    }
  }
  pool {
    range 192.168.178.101 192.168.178.200;
  }
}

host testsystem1.md80.ch {
 hardware ethernet 08:00:27:18:44:79;
 fixed-address 192.168.178.87;
 next-server 192.168.178.86;
 filename "pxelinux.0";
}

host svmqhub3.md80.ch {
 hardware ethernet 00:15:5d:01:27:0d;
 fixed-address 192.168.178.88;
 next-server 192.168.178.86;
 #filename "pxelinux.0";
}

host testsystem2.md80.ch {
 hardware ethernet 00:0c:29:9e:e7:40;
 fixed-address 192.168.178.189;
 next-server 192.168.178.86;
 filename "pxelinux.0";
}

Now my Gen2 system (svmqhub3) reads the grub2/grubx64.efi file over tftp and continues with grub2/grub.cfg-01-00-15-5d-01-27-0d

From the /var/log/messages

Feb  1 08:32:52 satellite dhcpd: DHCPDISCOVER from 00:15:5d:01:27:0d via eth0
Feb  1 08:32:52 satellite dhcpd: DHCPOFFER on 192.168.178.88 to 00:15:5d:01:27:0d via eth0
Feb  1 08:32:56 satellite dhcpd: DHCPREQUEST for 192.168.178.88 (192.168.178.86) from 00:15:5d:01:27:0d via eth0
Feb  1 08:32:56 satellite dhcpd: DHCPACK on 192.168.178.88 to 00:15:5d:01:27:0d via eth0
Feb  1 08:32:56 satellite in.tftpd[89332]: Error code 8: User aborted the transfer
Feb  1 08:32:57 satellite in.tftpd[89333]: Client 192.168.178.88 finished grub2/grubx64.efi
Feb  1 08:32:57 satellite in.tftpd[89334]: Client 192.168.178.88 finished /grub2/grub.cfg-01-00-15-5d-01-27-0d
Feb  1 08:32:57 satellite in.tftpd[89339]: Client 192.168.178.88 finished /grub2/grub.cfg-01-00-15-5d-01-27-0d

It displays the boot menu:

Chainload Grub2 EFI from ESP <-
Chainload into BIOS bootloader on first disk
Chainload into BIO bootloader on second disk

It then fails with:

Chainloading Grub2 EFI from ESP, available devices:
(hd0) error: failure reading sector 0x0 from 'hd0'.

probing ESP partition ... error: failure reading sector 0x0 from 'hd0'.
error no such device: /EFI/BOOT/BOOTX64.EFI.
found
File grubx64.efi not found on ESP
Update pxegrub2_chainload path array with:
error : disk `' not found.
the system wil halt in 2 minutes or press ESC to halt immediately.

Cheers

crashdog · February 1, 2019, 8:25am

so, during try and error i changed the tftp config to /tftpboot where I had mounted an Centos 7 iso. Now changed back tftp config in /etc/xinetd/tftp to :
server_args = -s /var/lib/tftpboot

it all works as I would expect it to !!! WOW !

crashdog · February 1, 2019, 8:31am

I think the most important piece of information was on this page: https://docs.oracle.com/cd/E52668_01/E54695/html/ol7-install-pxe-dhcp-tftp.html

namely :

 if substring(vendorclass, 0, 9)="PXEClient" {
    if pxetype=00:06 or pxetype=00:07 {
        filename "efi/grubx64.efi";
    } else {
        filename "pxelinux/pxelinux.0";
    }
  }

Now I can really start learning how to use Foreman

Cheers

crashdog · February 1, 2019, 9:43am

After the installation of my first VM I had to run this manually for Foreman to understand that the install was complete. Otherwise it would have remained in state pending installation.
/usr/bin/puppet agent --config /etc/puppet/puppet.conf --onetime --tags no_such_tag --server satellite.md80.ch --no-daemonize

wget -q -O /dev/null --no-check-certificate http://satellite.md80.ch/unattended/built?token=1d581781-c124-4383-a229-fb58fbd1a63d

Cheers

lzap · February 1, 2019, 12:41pm

Hello,

how did you solve:

Chainload Grub2 EFI from ESP

actually? We have seen some issues in the past, the template has been vastly improved in Foreman 1.21 RC.

(Please consider using formatting for snippets this is unreadable.)

crashdog · February 2, 2019, 9:30am

Hi,
I’m not shure if I really solved that. This menu appeared during try and error when I had pointed my tfpt to an Centos iso mount. I then changed it back to the directory where I have the foreman files deployed. There I got the expected installation menu.
Cheers

(thank you for fixing the formatting. I was rushing it a bit and have to get used to his editor.)

lzap · February 5, 2019, 9:17am

I would love to help but this report is a mess. Try to isolate separate problems and provide useful information. I understand that Foreman is a complex project, you need to spend time with it unfortunately. There are companies providing consultancy and there are several commercial products if you don’t have time however.

crashdog · February 5, 2019, 11:47am

I’ve sorted out all issues that I had. Feel free to delete my two posts. Thank you for your patience.