Having issue with foreman provisioning

was wondering if I could get some suggestions on the following problem.
I have taken over a system and the provisioning function does not work.
I hit the build button for a host, the button changes to cancel build.

I try to reboot and build the server but it does not pickup the pxe boot. I believe the certs are not getting passed but I’m unsure at this point were to look. I did note that the /etc/puppet/autosign.conf did not have the correct permissions set. the mode is now 644. The dose not appear to be getting update

I am pretty new to puppet and foreman so I would be grateful for any help given.

Welcome to the community! I’ve deleted the blank topic, I guess you created it by accident :slight_smile:

The provisioning workflow can be complex depending on your site config. In order, please check:

  1. That the host has a provisioning interface set (Host > Edit)
  2. That the host has a Subnet set (Host > Edit)
  3. That the Subnet has a Smart Proxy set for TFTP (Subnet > Edit)
  4. That the Smart proxy has the TFTP feature (Proxies > Refresh Features > Check TFTP is still listed

If that all checks out, log on to the proxy and go to the TFTP dir (location varies by OS, usually in /var/lib or /srv), and look for a file named for the MAC address of the host’s provisioning interface. If you check the contents, and then flip the Build flag and check again, you should see the content change from LOCALBOOT to the netboot installer of the desired OS.

I’m guessing one of those steps won’t check out, and we can take it from there :slight_smile:

1 Like

Hi thanks for the input. I checked the first three and thy check out. I’m not quite sure were to look for the last(4). I’m running version 1.16-1.e17.

So next step is to log on to the proxy. I found under /var/lib a directory called TFTPboot. In the grub dir there are files listed by MAC address. I listed and I don’t see a build flag. I’m guessing I’m looking wrong. I see listed rootnoverify and chainloader listed.

Does that sound right?

Thanks Ruth

Sorry, I’ve rushed my comment and been unclear. When I say “build flag” I mean the “Build” button in the UI. Clicking that will toggle the internal state of the Host, hence why I say “flipping”.

So, what I mean is something like:

cat /var/lib/tftpboot/01-{your mac}
# toggle build state from the UI
cat /var/lib/tftpboot/01-{your mac}

The second dump of file contents should have changed after the build state has changed. Feel free to paste the outputs if you want to be sure :wink:

I did not see a change in the file /var/lib/tftpboot/grub/(MAC Address)

When I hit the the build button in changes to cancel build. I can reboot the host, however the certs do not appear to be being passed from the proxy. The machine will not rebuild. The autosign.conf in /etc/puppet does not appear update.

I did modify the mode to 644 as instructed in the Foreman documentation.
Sorry I can’t give you files or logs off these boxes.

Ruth

I think we’re talking about different things. Let’s back up slightly. You’re saying that provisioning does not work and that a host does not PXE boot - that has nothing to do with autosign and certificates. The changes to the Puppet autosign and the delivery of certificates happens during the install, so if your host is not PXE booting into the installer, then I would expect no changes in the certs/autosign area. Thus, PXE/TFTP is the place to focus, from what you’ve said.

In normal operation I would expect to see something like this:

$ cat /var/lib/tftpboot/pxelinux.cfg/01-00-1e-06-33-83-84 
DEFAULT menu
PROMPT 0
MENU TITLE PXE Menu
TIMEOUT 200
TOTALTIMEOUT 6000
ONTIMEOUT local


LABEL local
  MENU LABEL Default local boot
  MENU DEFAULT
  LOCALBOOT 0
...

That’s for a host not in build mode. Once you click “Build” the file should be rewritten to contain the instaler for the host’s OS. So, say for a Debian host it might look like:

$ cat /var/lib/tftpboot/pxelinux.cfg/01-00-1e-06-33-83-84 
DEFAULT linux

LABEL linux
    KERNEL boot/Debian-8.6-x86_64-linux
    APPEND initrd=boot/Debian-8.6-x86_64-initrd.gz interface=auto url=http://my.foreman.server/unattended/provision ramdisk_size=10800 root=/dev/rd/0 rw auto hostname=newhost.my.domain auto=true domain=my.domain locale=en_US
    IPAPPEND 2

If that isn’t happening, then we’re going to need to dig in the logs to find out why the call to rewrite the file isn’t happening. If the change is happening, then the next question is to look at why it’s not picking that up on PXE boot. Let me know which path we’re on here :slight_smile:

Thanks so much for your help! I took a look, I am using fedoria but the above is happening.

I am seeing a rewrite to the file. BTW is there something you could point me too that would give me some of this process information. I’m not sure were to look.

Thanks Ruth

Depends on your learning preferences :slight_smile:

Foreman :: Manual will give you the written version, but you may prefer to check out some of the videos on Foreman :: Media - the Screencasts section might be up your street if you prefer video-based stuff.

OK, so the file is setup to boot into Fedora, that’s good. If I understand you, then you’re saying that PXE isn’t loading into the Fedora installer then? Can you capture what fails? Does it load the Fedora installer and fail, or not get that far? Do you see an attempt to PXE at all? There’s quite a few steps that can fail here…

If you watch the BIOS carefully while the host is booting, you should see it try to PXE, request DHCP, get an IP, and then download the cfg file from the TFTP server, and then boot into the Fedora installer, in that order. The task is to spot which step is failing:

  1. the host might not be set to PXE (say, if disk is first in the boot order), in which case you may not see the PXE attempt at all.
  2. It may not get an IP from the DHCP server, which would probably manifest as a timeout
  3. It might fail to find the TFTP cfg file, which would imply an issue with the DHCP leases.
  4. It might load the Fedora installer and then bomb, in which case we’ll need to know what it actually complained about (perhaps a bad mirror)

So, back to the console, and watch carefully, PXE stuff can flash past very quickly :slight_smile:. Let me know which you think it is, and we can dig in to it.

Yes that is correct. I am seeing the following error "unable to set PuppetCA autosign for server name.

ERF12-0104.

I’m going for 1 or 3, It does not appear to be getting the jump start file. It skips pretty quickly to the os load.

Ruth

Sadly these two statements are contradictory:

This implies we got as far as the Fedora installer, as the autosign changes aren’t deployed until after the host has successfully PXE-booted and downloaded it’s Kickstart file. So to say it’s not PXE booting would not stack up. My guess is that when you say skips pretty quickly to the os load it’s actually in the installer, but tailing the logs on Foreman will prove it (you’ll see a call to /unattended/provision from the host to get the Kickstart as it goes into the installer).

That said, your error code is documented, and usually results from permission errors on the proxy. See:

http://projects.theforeman.org/projects/foreman/wiki/ERF12-0104

for details. Let us know if that helps :slight_smile:

I read the error code info a while back and changed the permission accordingly. The current autosign.conf file does not appear to be being updated. Changing the permissions allowed me to cancel the build. That’s about all.

The BIOS messages go by so quickly I cannot catch the message. Is there a place I might be able to read what is said? I believe that the jumpstart file is not available for what what every reason. The boot never goes to jumpstart.

Actually meant Kickstart

Could you perhaps be a bit more specific about which foreman log you are trying to get me to look at. I don’t see the message but there are a number of logs.

Ruth

Ah I see - that would make sense then. The changes aren’t happening to the autosign file because you’re not getting that far, and getting ERF12-0104 would happen on cancel (when it tries to remove the host from the autosign file). At least that’s one issue down :slight_smile:

So, it’d be /var/log/foreman/production/log and yes, if it’s a busy Foreman instance it can be pretty noisy. You could always do something like tail -f /var/log/foreman/production/log | grep -C1 GET to at least see the requests as they come in.

If you do see a hit for /unattended/provision from this host, then you can note the timestamp and go digging in the logs a bit further. If the logs note it’s a 404 (or anything other than 200 really), then it should log why it’s doing that, and we can hopefully progress…

Thanks, it seems that the underlying issue is that part of the systems being hung is that the part of the kickstart that notifies foreman that the build is complete is not triggering.

There’s a whole ton of possible reasons for that - are you able to share the error logs? Feel free to anonymise IPs, names, etc. Having a look at what’s happening would make it much easier to help you, although it may be necessary to enable debug logs to get a true picture of what’s going on.

I am getting another issue that is stopping me from testing at the moment. Something is changing the permissions on /etc/dhcp folder get reset which throws a ERF12-0635 error. I have reset the perms on the folder but seem to remember that we need to restart a process to get this fixed.

I’ll have to ask “the powers that be” about the logs. I know its hard when your blind, I’m just as frustrated. I was thinking about debug too. It kind of looks like the the puppet master may not be signing the certs.

You could edit the appropriate template ( get there via Hosts > your host > templates tab) and comment out the call to puppet agent .... That would establish if that’s really your cause. It would usually be here:

But you’ll need to trace your own templates to be sure. The template preview should be helpful in checking you’ve got it right.