Having issue with foreman provisioning

Ok so back to square one. What I see in the foreman production logs when I hit build is the following:

Rendering template "Kickstart_network_setting
" " Kickstart_ifcfg_get_identifer_names
" " kinkstart_ifcfg_generate_interface

on the host side from /var/log/messages I did see a message saying that “no certs were received”

I did not see anything that said /unattended/provision

There is an error that says that the puppet_agent did not receive the cert (1105)

I also see that there is an error on the agent that is a goferd[903] [Error] (qd:no-route-to-dest)

I get clean runs when I type in puppet agent -t. The agent is speaking to the puppet master.

Something does not add up here again. You say provisioning isn’t working, yet you’re getting as far as running puppet agent on the host - which implies the provisioning is complete. If we’re going to be doing this blind, then you’re going to have to be a lot clearer about what the issue is.

You won’t see anything on the host when you change a host to Build mode. All the changes happen on Foreman and on the proxy. You’ll see the request to /unattended/provision when the host boots into the Fedora installer from PXE, in /var/log/foreman/production.log as I said here

I checked the foreman production logs and did not see anything hat said /unattended/provision

The issue is that the host does not rebuild, i.e. does not pick up the kickstart from foreman.

Dumb question, is there some place in foreman that sets the host to be able to pxe boot?

If the host hasn’t rebuilt, then testing things like puppet agent is pointless, since you’re still on the old cert. Foreman takes care of setting the content of the appropriate file on the TFTP server (which you confirmed did change content from Localboot to an installer in post 7). Foreman also tells the DHCP server to include the “nextserver” directive in the DHCP lease, which points to the IP of the TFTP server (assuming Foreman is managing DHCP of course).

After that, it’s up to the host, which is why I was asking for what happens at PXE time. I know it can pass by pretty fast, but that’s the only way to know what’s going wrong. It will say whether it got an IP (and what that IP was, which you can check is what it should be). If it does, then it will also say which server it’s querying, and which file it’s loading (which should be the one you checked in post 7). Without that information, we’re at an impasse - knowing what is blocking PXE from booting the installer is key.

Ok, I when into the system setup to verify the host was set to pxe (it was)

I looked at the boot sequence and set it to boot from the NIC.

When I tried a reboot I get a DHCP time out no offers were received.

The machine was set to boot from local disk so I don’t think it tried to go to PXE.

I tried to look at the BIOS messages, it looks like there are four lines I can’t read them.

Thanks for you help with this.

Ruth

DHCP timeout definitely sounds like a PXE attempt. Check the logs on the DHCP server, you should see DHCPDISCOVER from the host’s MAC, hopefully followed by why no DHCPOFFER is made.

I’m not seeing a DHCPDISCOVER, I am seeing a DHCPINFORM which says that <IP address is not authoritative for subnet
I checked this via journalctl

I’m also a little confused. Will not foreman handle the boot sequence for me, assuming that the host is set to use PXE. It seems a bit cumbersome to have to reset the boot order on the host.

(Side note, consider single replies with all your points in, it makes for nicer reading and is less spammy to our email-based users)

From RFC 2131:

DHCPINFORM - Client to server, asking only for local configuration
parameters; client already has externally configured
network address.

My reading of the RFC suggests that to issue a DHCPINFORM requires the NIC to already have an IP, so either it’s statically configured within the NIC itself (BIOS config option, I would guess), or the DHCPDISCOVER/OFFER conversation happened further up the log. Worth investigating.

As to the boot order, the answer is yes-and-no. I’m assuming we’re talking about a physical machine here, not a VM - in this case, no, Foreman cannot directly control the BIOS of your host. Many physical hosts don’t even have the option to set the BIOS from the OS, remotely or locally, so we cannot build a workflow around that.

What we do recommend is that the machine is permanently configured to boot from PXE first, and then local disk second. As you’ve already seen for yourself, when the host is not in Build mode, the file on the TFTP server contains a LOCALBOOT directive. Thus a PXE request will result in the host skipping to the next device, the local disk. When in Build mode, the file is rewritten for reinstallation - this way a host will always boot correctly, based on it’s Build state, and you don’t have to keep altering the BIOS settings.

Thanks for your reply. So when I hit build via Foreman, it should go out and build the server correct. What should I see via Forman when I hit the build button. As it is now it just changes from build to cancel build.

If I look at the production logs on the Foreman server I do see quite a bit of activity.

What complicates matters is that you’ve not said what features are enabled on Foreman - it can manage PuppetCA, DHCP, DNS, and TFTP, or just a subset of these - and of course expectations depend on configuration.

Assuming a complete configuration of controlling everything, then I would expect click “Build” to cause just the changes to PXE file, but that’s because alll the other stuff is done at host creation time. If this were a brand new host, you’d see:

  • A DHCP reservation created for the MAC/IP combination
  • A DNS A-record and PTR-record created for the IP/name combination
  • A PXElinux config file created for the MAC

This would mean that when the host boots for the first time, it can get an IP from the DHCP server, and be told where the TFTP server is (‘nextserver’ option in the lease). It then queries said IP for TFTP/PXE, and is given a PXE file, which it then uses to load an initrd/kernel over the network.

Again, I’m desribing generics here, for example if your provisioning network has not got Foreman managing DHCP, then you’d be responsible for ensuring the leases give out the right nextserver IP, and so forth.

As an example of creating a new host, the logs in @trinaryouroboros’s recent post give a nice example:

This is host creation rather than just flipping the Build flag, but you get the idea - here you see it creating a DHCP lease, creating the PXElinux cfg files, and checking if the initrd/vmlinuz files need downloading to the proxy. This is the kind of output you’re looking for in production.log. I encourage you to read that thread as there are other log examples they may help you make sense of what you’re seeing, since you can’t share it for us to see.

To answer your question, yes, you’d only expect to see the button change from Build to Cancel Build in the UI - the rest happens behind the scenes. You’d then go reboot the server at your leisure, and it should pick up the changed PXElinux cfg file.

To try to help a little, here’s a shot of one of my VMs booting TFTP - I stopped my TFTP server so it would hang while I got the shot :stuck_out_tongue:

You can see that it records the MAC, confirms that it got IP 172.20.10.22, and that the “nextserver” is 172.20.10.1 (which is correct). It then loads pxelinux.0 (which fails as I stopped the TFTP server), and would then got on to load pxelinux.cfg/01-52-54-00-1a-ca-61 which corresponds to the MAC. Since this host is not in Build mode, that file contains LOCALBOOT 0, and the host would then boot from disk. Hope it helps.

Ok, so nor after getting our network engineer to work on a switch I’m getting messages as follows:

DHCPACK
DHCPDISCOVER
DHCPOFFER
DHCPREQUEST
DHCPACK

However on the agent, I’m getting a PXE-M01 (I think) no existing boot agent.

Are you sure you are getting DHCP answer from the correct DHCP server? We’ve seen many times users running multiple DHCP servers on one segment leading to an incorrect behavior.

We only have one DHCP server. I was wondering how to tell if the boot agent exist on the DHCP server. the messages seem to indicate its communicating.

We found that the TFTP daemon was not running. Now the error is that it is exiting the intel boot agent with a PXE-MOF error.