Having issue with foreman provisioning

I’m going for 1 or 3, It does not appear to be getting the jump start file. It skips pretty quickly to the os load.

Ruth

Sadly these two statements are contradictory:

This implies we got as far as the Fedora installer, as the autosign changes aren’t deployed until after the host has successfully PXE-booted and downloaded it’s Kickstart file. So to say it’s not PXE booting would not stack up. My guess is that when you say skips pretty quickly to the os load it’s actually in the installer, but tailing the logs on Foreman will prove it (you’ll see a call to /unattended/provision from the host to get the Kickstart as it goes into the installer).

That said, your error code is documented, and usually results from permission errors on the proxy. See:

http://projects.theforeman.org/projects/foreman/wiki/ERF12-0104

for details. Let us know if that helps :slight_smile:

I read the error code info a while back and changed the permission accordingly. The current autosign.conf file does not appear to be being updated. Changing the permissions allowed me to cancel the build. That’s about all.

The BIOS messages go by so quickly I cannot catch the message. Is there a place I might be able to read what is said? I believe that the jumpstart file is not available for what what every reason. The boot never goes to jumpstart.

Actually meant Kickstart

Could you perhaps be a bit more specific about which foreman log you are trying to get me to look at. I don’t see the message but there are a number of logs.

Ruth

Ah I see - that would make sense then. The changes aren’t happening to the autosign file because you’re not getting that far, and getting ERF12-0104 would happen on cancel (when it tries to remove the host from the autosign file). At least that’s one issue down :slight_smile:

So, it’d be /var/log/foreman/production/log and yes, if it’s a busy Foreman instance it can be pretty noisy. You could always do something like tail -f /var/log/foreman/production/log | grep -C1 GET to at least see the requests as they come in.

If you do see a hit for /unattended/provision from this host, then you can note the timestamp and go digging in the logs a bit further. If the logs note it’s a 404 (or anything other than 200 really), then it should log why it’s doing that, and we can hopefully progress…

Thanks, it seems that the underlying issue is that part of the systems being hung is that the part of the kickstart that notifies foreman that the build is complete is not triggering.

There’s a whole ton of possible reasons for that - are you able to share the error logs? Feel free to anonymise IPs, names, etc. Having a look at what’s happening would make it much easier to help you, although it may be necessary to enable debug logs to get a true picture of what’s going on.

I am getting another issue that is stopping me from testing at the moment. Something is changing the permissions on /etc/dhcp folder get reset which throws a ERF12-0635 error. I have reset the perms on the folder but seem to remember that we need to restart a process to get this fixed.

I’ll have to ask “the powers that be” about the logs. I know its hard when your blind, I’m just as frustrated. I was thinking about debug too. It kind of looks like the the puppet master may not be signing the certs.

You could edit the appropriate template ( get there via Hosts > your host > templates tab) and comment out the call to puppet agent .... That would establish if that’s really your cause. It would usually be here:

But you’ll need to trace your own templates to be sure. The template preview should be helpful in checking you’ve got it right.

Ok so back to square one. What I see in the foreman production logs when I hit build is the following:

Rendering template "Kickstart_network_setting
" " Kickstart_ifcfg_get_identifer_names
" " kinkstart_ifcfg_generate_interface

on the host side from /var/log/messages I did see a message saying that “no certs were received”

I did not see anything that said /unattended/provision

There is an error that says that the puppet_agent did not receive the cert (1105)

I also see that there is an error on the agent that is a goferd[903] [Error] (qd:no-route-to-dest)

I get clean runs when I type in puppet agent -t. The agent is speaking to the puppet master.

Something does not add up here again. You say provisioning isn’t working, yet you’re getting as far as running puppet agent on the host - which implies the provisioning is complete. If we’re going to be doing this blind, then you’re going to have to be a lot clearer about what the issue is.

You won’t see anything on the host when you change a host to Build mode. All the changes happen on Foreman and on the proxy. You’ll see the request to /unattended/provision when the host boots into the Fedora installer from PXE, in /var/log/foreman/production.log as I said here

I checked the foreman production logs and did not see anything hat said /unattended/provision

The issue is that the host does not rebuild, i.e. does not pick up the kickstart from foreman.

Dumb question, is there some place in foreman that sets the host to be able to pxe boot?

If the host hasn’t rebuilt, then testing things like puppet agent is pointless, since you’re still on the old cert. Foreman takes care of setting the content of the appropriate file on the TFTP server (which you confirmed did change content from Localboot to an installer in post 7). Foreman also tells the DHCP server to include the “nextserver” directive in the DHCP lease, which points to the IP of the TFTP server (assuming Foreman is managing DHCP of course).

After that, it’s up to the host, which is why I was asking for what happens at PXE time. I know it can pass by pretty fast, but that’s the only way to know what’s going wrong. It will say whether it got an IP (and what that IP was, which you can check is what it should be). If it does, then it will also say which server it’s querying, and which file it’s loading (which should be the one you checked in post 7). Without that information, we’re at an impasse - knowing what is blocking PXE from booting the installer is key.

Ok, I when into the system setup to verify the host was set to pxe (it was)

I looked at the boot sequence and set it to boot from the NIC.

When I tried a reboot I get a DHCP time out no offers were received.

The machine was set to boot from local disk so I don’t think it tried to go to PXE.

I tried to look at the BIOS messages, it looks like there are four lines I can’t read them.

Thanks for you help with this.

Ruth