I reinstalled and lost all other patches, just applying the one mentioned. Unfortunately that meant I lost logging but it seems we still failed to deploy the token to the smart proxies.
This indicates freshness is good:
fmspx1-ob-159 /var/lib/tftpboot # ls -l `find . -name '*70*10*6f*dd*a0'`
-rw-r--r-- 1 foreman-proxy foreman-proxy 1002 Nov 23 04:47 ./grub2/grub.cfg-01-70-10-6f-a9-dd-a0
-rw-r--r-- 1 foreman-proxy foreman-proxy 1002 Nov 23 04:47 ./grub2/grub.cfg-70:10:6f:a9:dd:a0
-rw-r--r-- 1 foreman-proxy foreman-proxy 564 Nov 23 04:47 ./pxelinux.cfg/01-70-10-6f-a9-dd-a0
# This file was deployed via 'MSE Kickstart PXELinux' template
# token value is --> <-- (entry: this shows the result of <%= @host.token %>)
DEFAULT menu
MENU TITLE Booting into OS installer (ESC to stop)
TIMEOUT 100
ONTIMEOUT installer
LABEL installer
MENU LABEL MSE Kickstart PXELinux
KERNEL http://bbrepo.bdns.bloomberg.com/pub/repos/rhel/rhel-server-7.6-x86_64//images/pxeboot/vmlinuz
APPEND initrd=http://bbrepo.bdns.bloomberg.com/pub/repos/rhel/rhel-server-7.6-x86_64//images/pxeboot/initrd.img ks=http://fmspx1.bdns.bloomberg.com:8440/unattended/provision BOOTIF=70:10:6f:a9:dd:a0
IPAPPEND 2
Iâm not exactly sure if different logging is required to help with this new strategy.
We have our servers operating behind a HA proxy frontend, and our smart-proxies behind a load-distributor via DNS.
We use memcached and an external postgresql database. I imagine there are significant differences in our configuration to what you might have available.
Our general setup is that we run foreman-installer to do initial configuration against a foreman-answers placed onto a new machine via chef. However we donât use puppet and instead overwrite a few of the configuration files using chef - such as /etc/foreman/settings.yml and some SSL certs to provide our own PKI.
I have been able to repro in a container environment. Unfortunately this still uses internal docker containers and repositories so couldnât be reproduced externally yet. Weâll use this to check solutions prior to attempting them in our production environment, but it just involves:
creating the containers with SSL ports exposed
deploying the foreman-answers files onto the server and smartproxy containers
generating and deploying SSL certificates
running foreman-installer
add the smartproxy to the server config
There are some manual steps to create a organisation/hostgroup/subnet/all associations and then I have a script to post some fake facts to the smartproxy to create a discovered host. On âbuildingâ this fake host I see a token appear in the server side âreviewâ of the template, but in the actual template on the smartproxy there is no token.
If it would be worthwhile Iâm fairly sure I now do a port that could be used externally, otherwise I thought you might just be able to ask for some specifics as this docker system is significantly simpler that our production system.
I am not sure I will be able to setup docker, however why donât you describe me exactly how you setup your subnet and hostgroup (every single field to the last detail). I can test the same setup on my system. There must be something you donât setup that for me is obvious.
Also to rule out UI issues, could you trigger the provisioning via our CLI?
Create domain bloomberg.com -> to match posted discovered facts
create a subnet for my fake discovered host, ip range 10.246.195.0/24 for host 10.246.195.5. Configure all proxies for this subnet to point at the smartproxy, add domain association to subnet that was created
create âmyhostgroupâ with association to bloomberg.com domain. configure arch to be i686, OS: âredhat 7.6â, media: Debian Mirror, partition table âKickstart defaultâ, set linux password: âpasswordâ, ipv4 subnet to the one that was created
settings -> discovered -> reboot -> âNoâ (as our discovered host doesnât actually exist in this test), this doesnât seem to stop the test from working but causes the UI to hang.
provision discovered host -> select myhostgroup when prompted, otherwise defaults
[root@smartproxy tftpboot]# cat grub2/grub.cfg-01-00-0c-29-cf-00-45
#
# This file was deployed via 'Preseed default PXEGrub2' template
#
# Supported host/hostgroup parameters:
#
# blacklist = module1, module2
# Blacklisted kernel modules
#
# lang = en_US
# System locale
#
set default=0
set timeout=10
menuentry 'Preseed default PXEGrub2' {
linux boot/debian-mirror-stNH1XDWq2I1-vmlinuz interface=auto url=http://server.container.com/unattended/provision ramdisk_size=10800 root=/dev/rd/0 rw auto hostname=mac000c29cf0045.bloomberg.com console-setup/ask_detect=false console-setup/layout=USA console-setup/variant=USA keyboard-configuration/layoutcode=us localechooser/translation/warn-light=true localechooser/translation/warn-severe=true locale=en_US BOOTIF=01-$net_default_mac
initrd boot/debian-mirror-stNH1XDWq2I1-initrd.img
}
For your CLI request I repeated using hammer, which Iâm less familiar with:
[root@server /]# hammer --verify-ssl 0 -p password discovery list
---|-----------------|-------------------|------|--------|------------|------------|-----------------------------------|--------------------
ID | NAME | MAC | CPUS | MEMORY | DISK COUNT | DISKS SIZE | SUBNET | LAST REPORT
---|-----------------|-------------------|------|--------|------------|------------|-----------------------------------|--------------------
2 | mac000c29cf0045 | 00:0c:29:cf:00:45 | 0 | 0 | 0 | 0 | 10.246.195.0/24 (10.246.195.0/24) | 2020/12/10 13:45:43
---|-----------------|-------------------|------|--------|------------|------------|-----------------------------------|--------------------
[root@server /]# hammer --verify-ssl 0 -p password discovery provision --name mac000c29cf0045 --hostgroup myhostgroup
Host created
There seems to be maybe a misconfiguration, I donât understand why the above hammer commands would result in the following, where it appears the selected template for the OS is ignored:
[root@smartproxy /]# cat /var/lib/tftpboot/grub2/grub.cfg-01-00-0c-29-cf-00-45
set default=local
set timeout=20
echo Default PXE local template entry is set to 'local'
insmod part_gpt
insmod fat
insmod chain
menuentry 'Chainload Grub2 EFI from ESP' --id local_chain_hd0 {
echo Chainloading Grub2 EFI from ESP, enabled devices for booting:
ls
echo "Trying /EFI/fedora/shim.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/fedora/shim.efi
if [ -f ($chroot)/EFI/fedora/shim.efi ]; then
chainloader ($chroot)/EFI/fedora/shim.efi
echo "Found /EFI/fedora/shim.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/fedora/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/fedora/grubx64.efi
if [ -f ($chroot)/EFI/fedora/grubx64.efi ]; then
chainloader ($chroot)/EFI/fedora/grubx64.efi
echo "Found /EFI/fedora/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/redhat/shim.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/redhat/shim.efi
if [ -f ($chroot)/EFI/redhat/shim.efi ]; then
chainloader ($chroot)/EFI/redhat/shim.efi
echo "Found /EFI/redhat/shim.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/redhat/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/redhat/grubx64.efi
if [ -f ($chroot)/EFI/redhat/grubx64.efi ]; then
chainloader ($chroot)/EFI/redhat/grubx64.efi
echo "Found /EFI/redhat/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/centos/shim.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/centos/shim.efi
if [ -f ($chroot)/EFI/centos/shim.efi ]; then
chainloader ($chroot)/EFI/centos/shim.efi
echo "Found /EFI/centos/shim.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/centos/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/centos/grubx64.efi
if [ -f ($chroot)/EFI/centos/grubx64.efi ]; then
chainloader ($chroot)/EFI/centos/grubx64.efi
echo "Found /EFI/centos/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/debian/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/debian/grubx64.efi
if [ -f ($chroot)/EFI/debian/grubx64.efi ]; then
chainloader ($chroot)/EFI/debian/grubx64.efi
echo "Found /EFI/debian/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/ubuntu/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/ubuntu/grubx64.efi
if [ -f ($chroot)/EFI/ubuntu/grubx64.efi ]; then
chainloader ($chroot)/EFI/ubuntu/grubx64.efi
echo "Found /EFI/ubuntu/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/sles/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/sles/grubx64.efi
if [ -f ($chroot)/EFI/sles/grubx64.efi ]; then
chainloader ($chroot)/EFI/sles/grubx64.efi
echo "Found /EFI/sles/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/opensuse/grubx64.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/opensuse/grubx64.efi
if [ -f ($chroot)/EFI/opensuse/grubx64.efi ]; then
chainloader ($chroot)/EFI/opensuse/grubx64.efi
echo "Found /EFI/opensuse/grubx64.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo "Trying /EFI/Microsoft/boot/bootmgfw.efi "
unset chroot
search --file --no-floppy --set=chroot /EFI/Microsoft/boot/bootmgfw.efi
if [ -f ($chroot)/EFI/Microsoft/boot/bootmgfw.efi ]; then
chainloader ($chroot)/EFI/Microsoft/boot/bootmgfw.efi
echo "Found /EFI/Microsoft/boot/bootmgfw.efi at $chroot, attempting to chainboot it..."
sleep 2
boot
fi
echo Partition with known EFI file not found, you may want to drop to grub shell
echo and investigate available files updating 'pxegrub2_chainload' template and
echo the list of known filepaths for probing. Contents of \EFI directory:
ls ($chroot)/EFI
echo The system will halt in 2 minutes or press ESC to halt immediately.
sleep -i 120
halt --no-apm
}
menuentry 'Chainload into BIOS bootloader on first disk' --id local_chain_legacy_hd0 {
set root=(hd0,0)
chainloader +1
boot
}
menuentry 'Chainload into BIOS bootloader on second disk' --id local_chain_legacy_hd1 {
set root=(hd1,0)
chainloader +1
boot
}
menuentry 'Foreman Discovery Image httpboot efi' --id discoveryefihttpboot {
linuxefi /httpboot/boot/fdi-image/vmlinuz0 rootflags=loop root=live:/fdi.iso rootfstype=auto ro rd.live.image acpi=force rd.luks=0 rd.md=0 rd.dm=0 rd.lvm=0 rd.bootif=0 rd.neednet=0 nokaslr nomodeset proxy.url=https://server.container.com proxy.type=foreman BOOTIF=01-$net_default_mac
initrdefi /httpboot/boot/fdi-image/initrd0.img
}
menuentry 'Foreman Discovery Image efi' --id discoveryefi {
linuxefi boot/fdi-image/vmlinuz0 rootflags=loop root=live:/fdi.iso rootfstype=auto ro rd.live.image acpi=force rd.luks=0 rd.md=0 rd.dm=0 rd.lvm=0 rd.bootif=0 rd.neednet=0 nokaslr nomodeset proxy.url=https://server.container.com proxy.type=foreman BOOTIF=01-$net_default_mac
initrdefi boot/fdi-image/initrd0.img
}
menuentry 'Foreman Discovery Image httpboot ' --id discoveryhttpboot {
linux /httpboot/boot/fdi-image/vmlinuz0 rootflags=loop root=live:/fdi.iso rootfstype=auto ro rd.live.image acpi=force rd.luks=0 rd.md=0 rd.dm=0 rd.lvm=0 rd.bootif=0 rd.neednet=0 nokaslr nomodeset proxy.url=https://server.container.com proxy.type=foreman BOOTIF=01-$net_default_mac
initrd /httpboot/boot/fdi-image/initrd0.img
}
menuentry 'Foreman Discovery Image ' --id discovery {
linux boot/fdi-image/vmlinuz0 rootflags=loop root=live:/fdi.iso rootfstype=auto ro rd.live.image acpi=force rd.luks=0 rd.md=0 rd.dm=0 rd.lvm=0 rd.bootif=0 rd.neednet=0 nokaslr nomodeset proxy.url=https://server.container.com proxy.type=foreman BOOTIF=01-$net_default_mac
initrd boot/fdi-image/initrd0.img
}
Can you provide me hammer hostgroup show output of the hostgroup? Feel free to anonymize the output, but to the degree I can still read important stuff.
This is probably the only difference, all my systems are x86_64.
Full disclosure, I am testing with Foreman 2.3 and CentOS, but Debian or Red Hat it does not matter, PXE template both use the same macro foreman_url.
One thing I do see are plugins, for example foreman_host_extra_validator is the one I suspect, but granted that one is fairly small. It is worth trying out without these plugins, ideally all of them.
But whatâs the most interesting is the last find - you see local PXE grub template, meaning that build mode was not engaged when the template was rendered. Thatâs a difference from PXELinux when the host was in build mode (but token was missing). Something is going on and I am unable to understand this.
I think Iâve managed to conclude that this is a problem with the server-side foreman-discovery gem that weâre using - 16.0.1.
I think I know that as I kept the smart proxy completely the same, but upgraded the foreman server to 2.2 (via a migration). I believe the foreman-discovery plugin was kept at the same version, I managed to repro the issue. I definitely confirmed that I was at 2.2, and the smartproxy was unchanged.
Because that was quite messy, I repeated with a clean install on the server, but accidentally went to 2.3.1 (latest) rather than stable. I kept the smartproxy build exactly the same (no upgrade at all). When I did this and repeated my configuration steps it all works fine (not able to reproduce the issue).
So IMO either the problem existed up to 2.2 and was fixed by 2.3.1 (unlikely); or indeed the actual problem was tfm-rubygem-foreman_discovery 16.0.1.
Either way, I think weâre happy and can make some good decisions from this info.
Thanks for getting back to us. I quickly checked git log and I donât see any commit in relation to tokens, it must have been some incompatibility which slipped through our testing process. Weird.
We only maintain two releases back, so 2.2/2.3 so I strongly suggest to plan your upgrades accordingly. For longer lifecycle you can get Red Hat Satellite or ATIX.
Agreed, I had a bit of a search through the changes and couldnât spot anything. I have now managed to repeat with a 2.2.1 clean install (works) and repeat with holding the plugin and migrating to 2.2.1 from 1.24.3 - again reproducing the error.
I missed a third possibility that itâs the migration thatâs carrying over some bad config, unfortunately I donât think Iâm easily able to figure out how to do a clean install and downgrade. That seems like a pain for no benefit anyway. As you say, whatever it was, itâs long in the past.
Okay, well honestly this thread was in my stomach for few weeks now, I am happy you sorted it out. I saw similar error caused by a hook, but you showed you donât have any. It could have been a bug, I expect 2.3 to be better than 2.0 or 2.2 honestly.