@ekohl Thanks for the suggestion! I tried it, but it didn’t change anything.
However, I did play around with the iPXE intermediate script for a bit, I was able to find out the following:
For a known host, the script will redirect to get iPXE default local boot
The iPXE default local boot will exit with whatever exit code is in there
exit 0 means that ‘booting’ the image is a succes
exit !0 means that ‘booting’ the image fails
Either way, it seems when the iPXE default local boot script exits, it will continue along the lines of the iPXE intermediate script, which will always end with no_nic printing the error message and sleeping for 30 seconds.
Can you do network dump, does iPXE firmware really send this header? Why? How? From what I was able to find, iPXE does not even support HTTP proxy. There must be some kind of transparent proxy in between you do not about - this is possible I have configured several these deployments myself (basically firewall is configured to forward requests to 80/443 via http proxy).
Attached is a pcap file I made with Wireshark containing the boot process of a known VM. It starts at packet 13 (the DHCP discover). There you can also see that the server that responds is the foreman-proxy process (even though that still could happen when transparently proxied).
Some details:
192.168.255.15 = foreman host (primary, not a smart proxy)
192.168.255.151 = VM that boots up, already installed with CentOS7
You gather the http header somehow which mislead the controller.
I am still puzzled by this. If we saw X-Forwarded-For set to 192.168.255.15 then I could say smart-proxy somehow adds this which was a bug. But you have 192.168.255.152 there, that was a VM which was booting up, wasn’t it?
Right, but in the case of a single Foreman host, isn’t the smart proxy it’s internal smartproxy (I remember reading somewhere in either the Foreman or Satellite docs that it comes with it’s own ‘internal’ capsule/smartproxy).
Correct, that was also a VM. The IPs are bogus btw, it’s a local range on my laptop and as I’ve tested multiple Foreman servers in there (I also have one with 2.3.3 installed), they sometimes differ a bit
Now the question is - why your proxy puts its own IP address into REMOTE_ADDR instead of your clients one. Can you investigate that? Probably some debug statements, show logs from proxy (there should be the IP address) etc.
So that Smart Proxy is the key and explains why we see a DNS name instead of an IP:
Here the host is a DNS name instead of an IP which the reverse proxy middleware can’t deal with. Previously it didn’t validate an intermediate values. Now that we do, it can’t handle this invalid data.
Now the question is, should we modify Foreman to also allow DNS names and filter those out as valid reverse proxies or modify Smart Proxy to send an IP?
Now the question I have: do we even need to add the IP ourselves? I think Apache already appends the connecting IP so only setting X-Forwarded-For to REMOTE_ADDR could be sufficient but I’m not really sure.
foreman-installer --help | grep proxy-template-url
--foreman-proxy-template-url URL a client should use for provisioning templates (current: "http://foreman.lbhr.htm.lan:8000")
to
foreman-installer --help | grep proxy-template-url
--foreman-proxy-template-url URL a client should use for provisioning templates (current: "http://192.168.255.15:8000")
Afterwards I restarted the services and it did not change anything
Attached is a new PCAP file, but I don’t think there will be a lot of changes. ipxeboot.pcapng.log (12.6 KB)
You already have a host with matches this IP in the inventory.
The matching host is not in build mode.
Therefore Foreman assumes it is a known host and it should be booted from local drive.
Our host finding code works as follows: it first tries to match host via UUID, then via MAC address sent either via parameter or HTTP header (Anaconda installer) and finally it tries it via remote IP address and this also works via HTTP proxies.
Can you now specify the following:
What is IP address of your Foreman.
What is IP address of the provisioned host that is failing.
What IP do you see in HTTP access logs (foreman, proxy).
What IP do you see in the X-Forwarded-For header
This should give us little bit more insight. I think the problem here is that your Foreman thinks you are booting a known host somehow. I do not understand why.
This is on my instance. First iPXE request is correct:
[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?bootstrap=1 | head
#!ipxe
# Intermediate iPXE script to report MAC address to Foreman
:net0
isset ${net0/mac} || goto no_nic
dhcp net0 || goto net1
chain http://stable.nuc:8000/unattended/iPXE?mac=${net0/mac} || goto net1
Unknown host is presented with the default menu, which is also correct:
[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=00:00:00:00:00:00 | head
#!ipxe
echo Opening global default menu in 15 seconds...
sleep 15
set menu-default discovery
set menu-timeout 5000
set port 8448
A known host which is in build mode also renders correctly:
[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:F1 | head
#!gpxe
echo Trying to ping Gateway: ${netX/gateway}
ping --count 1 ${netX/gateway} || echo Ping to Gateway failed or ping command not available.
echo Trying to ping DNS: ${netX/dns}
ping --count 1 ${netX/dns} || echo Ping to DNS failed or ping command not available.
kernel http://mirror.centos.org/centos-7/7/os/x86_64//images/pxeboot/vmlinuz initrd=initrd.img ks=http://stable.nuc:8000/unattended/provision?token=5125dfd5-3d12-4a2c-a365-3dc7b0904205 network ksdevice=bootif ks.device=bootif BOOTIF=01-aa-bb-cc-dd-ee-f1 kssendmac ks.sendmac inst.ks.sendmac ip=dhcp
initrd http://mirror.centos.org/centos-7/7/os/x86_64//images/pxeboot/initrd.img
And finally a host that is not in build mode renders what you see, but this is correct:
[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:F1 | head
#!ipxe
# Skips booting from network and continues booting from next device
exit
Now, if I configure proxy to use HTTP instead of HTTPS and use tcpdump, I see the HTTP header being sent:
[root@stable ~]# tcpdump -i any -s 0 -A 'tcp port 80'
...cut...
GET /unattended/iPXE?mac=AA%3ABB%3ACC%3ADD%3AEE%3AF1&url=http%3A%2F%2Fstable.nuc%3A8000 HTTP/1.1
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
Accept: */*, application/json,version=2, */*
User-Agent: Ruby
Content-Type: application/json
User_agent: curl/7.29.0
X-Forwarded-For: ::1, stable.nuc
Connection: close
Host: stable.nuc
Now, I edited my existing host’s IPv6 address to be ::1 and here is a result of a new curl call:
[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:F1 | head
#!ipxe
# Skips booting from network and continues booting from next device
exit
That was the existing host, correctly getting local boot. Now let’s try with a MAC address that is unknown (but remember proxy carries over the IPv6 address which actually matches a host):
[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:AA | head
#!ipxe
echo Opening global default menu in 15 seconds...
sleep 15
set menu-default discovery
set menu-timeout 5000
set port 8448
This request is treated as an unknown host (a menu appears). All and all, it works for me!
Before going into detail, I’m starting to get the sense we’re barking up the wrong tree
In a nutshell, the problem is not that unknown hosts do no boot the FDI or that known hosts do not boot at all. Unkown and known hosts do boot, be it with a delay.
However, with Katello 4.0 my known host prints an error during iPXE boot I mentioned earlier here:
After this message, the system waits for 30 seconds before continuing the boot process. Which is new, as the Katello 3.18 server did not have this (when I boot a known VM against the 3.18 server it ‘just’ boots right away, without complaining about failing to chainload any network interface and sleeping for 30 seconds)
So, in response to your message:
Yes, I have both versions in VMs on my laptop, so I can easily switch
Intermediate:
#!ipxe
# Intermediate iPXE script to report MAC address to Foreman
:net0
isset ${net0/mac} || goto no_nic
dhcp net0 || goto net1
chain http://foreman.lbhr.htm.lan:8000/unattended/iPXE?mac=${net0/mac} || goto net1
# repeat 31 times
:net32
isset ${net32/mac} || goto no_nic
dhcp net32 || goto net33
chain http://foreman.lbhr.htm.lan:8000/unattended/iPXE?mac=${net32/mac} || goto net33
:net33
goto no_nic
exit 0
:no_nic
echo Failed to chainload from any network interface
sleep 30
exit 1
Local boot:
#!ipxe
# Skips booting from network and continues booting from next device
exit
At first yes, but to exclude any weird issues caused by the upgrade, I’m currently running with a fresh install. Good news is, the symptoms are the same (see above).
Unkown hosts should, and do, boot to the FDI for discovery.
192.168.255.15
192.168.255.151 or 192.168.255.152, the screenshot I linked above is this machine. The IP address differs a bit because I keep deleting it from Foreman’s database.
production.log:
2021-05-19T18:27:09 [I|app|fb74dacf] Started GET "/unattended/iPXE?mac=36%3A6A%3A3D%3A1F%3A7E%3ABC&url=http%3A%2F%2Fforeman.lbhr.htm.lan%3A8000" for 192.168.255.15 at 2021-05-19 18:27:09 -0400
2021-05-19T18:27:09 [I|app|fb74dacf] Processing by UnattendedController#host_template as TEXT
2021-05-19T18:27:09 [I|app|fb74dacf] Parameters: {"mac"=>"36:6A:3D:1F:7E:BC", "url"=>"http://foreman.lbhr.htm.lan:8000", "kind"=>"iPXE", "unattended"=>{}}
2021-05-19T18:27:09 [I|app|fb74dacf] Rendering text template
2021-05-19T18:27:09 [I|app|fb74dacf] Rendered text template (Duration: 0.0ms | Allocations: 4)
2021-05-19T18:27:09 [I|app|fb74dacf] Completed 200 OK in 225ms (Views: 1.7ms | ActiveRecord: 85.2ms | Allocations: 66707)
foreman-proxy.log:
2021-05-19T18:27:09 103183d9 [I] Started GET /unattended/iPXE mac=36:6A:3D:1F:7E:BC
2021-05-19T18:27:09 103183d9 [I] Finished GET /unattended/iPXE with 200 (271.67 ms)
The only time I saw any X-Forwarded-For headers was then I rigged /usr/share/foreman/app/controllers/unattended_controller.rb to print it’s environment. Which showed:
Looking at the results you get from your systems, I’d say the templates render correctly on both our systems and both versions. But for some reason, the Katello 4.0 booted system goes into it’s sleep 30 function before booting the system from the hard drive. Even when booting the template below is a success:
#!ipxe
# Skips booting from network and continues booting from next device
exit
If it helps, maybe we could do something on Google Meet/Jitsi where I share my screen and show you what I mean. But that depends on your timezone I’m in CEST (UTC +2).