Foreman 2.4 / Katello 4 - iPXE not working

Thulium-Drake · May 8, 2021, 10:24am

@ekohl Thanks for the suggestion! I tried it, but it didn’t change anything.

However, I did play around with the iPXE intermediate script for a bit, I was able to find out the following:

For a known host, the script will redirect to get iPXE default local boot
The iPXE default local boot will exit with whatever exit code is in there
- exit 0 means that ‘booting’ the image is a succes
- exit !0 means that ‘booting’ the image fails

Either way, it seems when the iPXE default local boot script exits, it will continue along the lines of the iPXE intermediate script, which will always end with no_nic printing the error message and sleeping for 30 seconds.

But I don’t know why

lzap · May 10, 2021, 12:26pm

Well I am out of idea, but I still see the X-Forwarded-For, are you able to try without HTTP proxy?

ekohl · May 10, 2021, 12:37pm

The thing I can’t really place is the hostname in X-Forwarded-For.

Are you using a content proxy? I get the impression you don’t, but want to make sure.

So is it:

client (192.168.255.151) -> server (192.168.255.15)`

Or:

client (192.168.255.151) -> proxy (foreman.lbhr.htm.lan) -> server (192.168.255.15)`?

Thulium-Drake · May 10, 2021, 4:04pm

Hi @ekohl @lzap

Nope, I made this lab as follows:

Install CentOS8.2 on the system
Run the foreman.operations.installer role to install it with Foreman/Katello
Configure it using the role I linked earlier.

But apart from the ‘usual’ (content, hostgroups etc) I didn’t configure anything in Foreman.

lzap · May 12, 2021, 12:24pm

Can you do network dump, does iPXE firmware really send this header? Why? How? From what I was able to find, iPXE does not even support HTTP proxy. There must be some kind of transparent proxy in between you do not about - this is possible I have configured several these deployments myself (basically firewall is configured to forward requests to 80/443 via http proxy).

Thulium-Drake · May 13, 2021, 1:36pm

I configured the DHCP server to send the following URL to all systems booting from PXE:

http://192.168.255.15:8000/unattended/iPXE?bootstrap=1

I also checked the system and the process running on that port is not Apache:

[root@foreman ~]# ss -tulpan | grep 8000
tcp       LISTEN     0          128                            0.0.0.0:8000                                               0.0.0.0:*                              users:(("smart-proxy",pid=60382,fd=17))                                        
tcp       LISTEN     0          128                               [::]:8000                                                  [::]:*                              users:(("smart-proxy",pid=60382,fd=18))

[root@foreman ~]# ps aux |grep 60382
foreman+   60382  0.2  0.4 1018232 50032 ?       Ssl  08:24   0:00 /usr/bin/ruby /usr/share/foreman-proxy/bin/smart-proxy --no-daemonize
root       64925  0.0  0.0 221904  1100 pts/1    S+   08:30   0:00 grep --color=auto 60382

Apart from opening ports, I didn’t tell firewalld to do anything special:

[root@foreman ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18
  sources: 
  services: cockpit dhcpv6-client ssh
  ports: 7/tcp 7/udp 53/tcp 53/udp 67/udp 69/udp 80/tcp 443/tcp 8000/tcp 5000/tcp 5646/tcp 5647/tcp 8140/tcp 8443/tcp 9090/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules:

Attached is a pcap file I made with Wireshark containing the boot process of a known VM. It starts at packet 13 (the DHCP discover). There you can also see that the server that responds is the foreman-proxy process (even though that still could happen when transparently proxied).

Some details:

192.168.255.15 = foreman host (primary, not a smart proxy)
192.168.255.151 = VM that boots up, already installed with CentOS7

ipxeboot.pcapng.log (15.1 KB)

Let me know if you need more information!

lzap · May 13, 2021, 2:33pm

It should look like this:

VM → Smart Proxy (8000) → Foreman (80)

You gather the http header somehow which mislead the controller.

I am still puzzled by this. If we saw X-Forwarded-For set to 192.168.255.15 then I could say smart-proxy somehow adds this which was a bug. But you have 192.168.255.152 there, that was a VM which was booting up, wasn’t it?

Thulium-Drake · May 13, 2021, 6:52pm

Right, but in the case of a single Foreman host, isn’t the smart proxy it’s internal smartproxy (I remember reading somewhere in either the Foreman or Satellite docs that it comes with it’s own ‘internal’ capsule/smartproxy).

Correct, that was also a VM. The IPs are bogus btw, it’s a local range on my laptop and as I’ve tested multiple Foreman servers in there (I also have one with 2.3.3 installed), they sometimes differ a bit

lzap · May 14, 2021, 7:59am

I’ve investigated the pcap file and it does NOT have any X-Forwarded-For header.

So I looked up in the proxy codebase and indeed we add it there, that makes sense:

modules/templates/proxy_request.rb
39:      proxy_headers["X-Forwarded-For"] = "#{env['REMOTE_ADDR']}, #{proxy_ip}"

Now the question is - why your proxy puts its own IP address into REMOTE_ADDR instead of your clients one. Can you investigate that? Probably some debug statements, show logs from proxy (there should be the IP address) etc.

ekohl · May 14, 2021, 10:51am

So that Smart Proxy is the key and explains why we see a DNS name instead of an IP:

github.com

theforeman/smart-proxy/blob/6c0ee87d874fd10bb1757538182b8332a1c96833/modules/templates/proxy_request.rb#L30


      
            Hash[env.select { |k, v| k =~ /^HTTP_/ && k !~ /^HTTP_(VERSION|HOST)$/ }.map { |k, v| [k[5..-1], v] }]
          rescue Exception => e
            logger.warn "Unable to extract request headers: #{e}"
            {}
          end
          
          private
          
          def call_template(method, path, env, params, body = '')
            template_url = Proxy::Templates::Plugin.settings.template_url
            proxy_ip = URI.parse(template_url).host
            opts = params.clone.merge(:url => template_url)
            BLACKLIST_PARAMETERS.each do |blacklisted_parameter|
              opts.delete(blacklisted_parameter)
            end
            # in hostgroup provisioning there are spaces
            path = path.map { |x| CGI.escape(x) }.join('/')
            logger.debug "Template: request for #{path} using #{opts.inspect} at #{uri.host}"
            proxy_headers = extract_request_headers(env)
            proxy_headers["X-Forwarded-For"] = "#{env['REMOTE_ADDR']}, #{proxy_ip}"
            proxy_headers["Content-Type"] = params["Content-Type"] if params["Content-Type"]

Here the host is a DNS name instead of an IP which the reverse proxy middleware can’t deal with. Previously it didn’t validate an intermediate values. Now that we do, it can’t handle this invalid data.

Now the question is, should we modify Foreman to also allow DNS names and filter those out as valid reverse proxies or modify Smart Proxy to send an IP?

X-Forwarded-For - HTTP | MDN only talks about IPs. So does X-Forwarded-For - Wikipedia. That suggests we should modify the Smart Proxy.

Now the question I have: do we even need to add the IP ourselves? I think Apache already appends the connecting IP so only setting X-Forwarded-For to REMOTE_ADDR could be sufficient but I’m not really sure.

For what it’s worth, it was introduced here:

lzap · May 17, 2021, 12:50pm

@Thulium-Drake can you modify templates.yml configuration and set the template_url to an IP address to verify?

I agree let’s just drop this.

ekohl · May 17, 2021, 1:12pm

FYI, if you use the installer the correct use is --foreman-proxy-template-url MY_URL.

Thulium-Drake · May 18, 2021, 12:36pm

So I changed the URL from

foreman-installer --help | grep proxy-template-url
    --foreman-proxy-template-url  URL a client should use for provisioning templates (current: "http://foreman.lbhr.htm.lan:8000")

to

foreman-installer --help | grep proxy-template-url
    --foreman-proxy-template-url  URL a client should use for provisioning templates (current: "http://192.168.255.15:8000")

Afterwards I restarted the services and it did not change anything

Attached is a new PCAP file, but I don’t think there will be a lot of changes.
ipxeboot.pcapng.log (12.6 KB)

ekohl · May 18, 2021, 2:17pm

Can you try to apply Fixes #32607 - drop host from x-forwarded-for by lzap · Pull Request #790 · theforeman/smart-proxy · GitHub instead?

cd /usr/share/foreman-proxy
curl https://github.com/theforeman/smart-proxy/pull/790.patch | patch -p1
systemctl restart foreman-proxy

Thulium-Drake · May 18, 2021, 7:24pm

No dice, I applied the change, tested it, no dice.

I also tried it with the URL set back to the default value (as I mentioned in my previous post)

Are you able to reproduce this issue btw? If not, is there anything I can do to help with that?

lzap · May 19, 2021, 12:42pm

Can you show me what the old version of Foreman renders, if you still have it?

You were upgrading from 3.18? This is fishy because the iPXE changes were merged into 1.20. You already had these changes in there:

Frankly, I am using iPXE in one of my workflows but not for unknown hosts. I am booting up my stable instance to check it out.

What is the expectation for unknown hosts then? What do you want the bootstrap template should return.

Here is how it works today:

Smart proxy (correctly) adds x-forwarded-for header.
You already have a host with matches this IP in the inventory.
The matching host is not in build mode.
Therefore Foreman assumes it is a known host and it should be booted from local drive.

Our host finding code works as follows: it first tries to match host via UUID, then via MAC address sent either via parameter or HTTP header (Anaconda installer) and finally it tries it via remote IP address and this also works via HTTP proxies.

Can you now specify the following:

What is IP address of your Foreman.
What is IP address of the provisioned host that is failing.
What IP do you see in HTTP access logs (foreman, proxy).
What IP do you see in the X-Forwarded-For header

This should give us little bit more insight. I think the problem here is that your Foreman thinks you are booting a known host somehow. I do not understand why.

ekohl · May 19, 2021, 12:51pm

Do you still have the correct trusted_proxies configured as suggested in Foreman 2.4 / Katello 4 - iPXE not working - #18 by ekohl? I think both the patch and trusted_proxies should help.

Right now I don’t have a lab to test this myself, but I should fix that.

lzap · May 19, 2021, 1:01pm

This is on my instance. First iPXE request is correct:

[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?bootstrap=1 | head
#!ipxe
# Intermediate iPXE script to report MAC address to Foreman

:net0
isset ${net0/mac} || goto no_nic
dhcp net0 || goto net1
chain http://stable.nuc:8000/unattended/iPXE?mac=${net0/mac} || goto net1

Unknown host is presented with the default menu, which is also correct:

[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=00:00:00:00:00:00 | head
#!ipxe

echo Opening global default menu in 15 seconds...
sleep 15

set menu-default discovery
set menu-timeout 5000
set port 8448

A known host which is in build mode also renders correctly:

[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:F1 | head
#!gpxe


echo Trying to ping Gateway: ${netX/gateway}
ping --count 1 ${netX/gateway} || echo Ping to Gateway failed or ping command not available.
echo Trying to ping DNS: ${netX/dns}
ping --count 1 ${netX/dns} || echo Ping to DNS failed or ping command not available.

kernel http://mirror.centos.org/centos-7/7/os/x86_64//images/pxeboot/vmlinuz initrd=initrd.img ks=http://stable.nuc:8000/unattended/provision?token=5125dfd5-3d12-4a2c-a365-3dc7b0904205  network ksdevice=bootif ks.device=bootif BOOTIF=01-aa-bb-cc-dd-ee-f1 kssendmac ks.sendmac inst.ks.sendmac ip=dhcp
initrd http://mirror.centos.org/centos-7/7/os/x86_64//images/pxeboot/initrd.img

And finally a host that is not in build mode renders what you see, but this is correct:

[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:F1 | head
#!ipxe

# Skips booting from network and continues booting from next device
exit

Now, if I configure proxy to use HTTP instead of HTTPS and use tcpdump, I see the HTTP header being sent:

[root@stable ~]# tcpdump -i any -s 0 -A 'tcp port 80'
...cut...
GET /unattended/iPXE?mac=AA%3ABB%3ACC%3ADD%3AEE%3AF1&url=http%3A%2F%2Fstable.nuc%3A8000 HTTP/1.1
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
Accept: */*, application/json,version=2, */*
User-Agent: Ruby
Content-Type: application/json
User_agent: curl/7.29.0
X-Forwarded-For: ::1, stable.nuc
Connection: close
Host: stable.nuc

Now, I edited my existing host’s IPv6 address to be ::1 and here is a result of a new curl call:

[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:F1 | head
#!ipxe

# Skips booting from network and continues booting from next device
exit

That was the existing host, correctly getting local boot. Now let’s try with a MAC address that is unknown (but remember proxy carries over the IPv6 address which actually matches a host):

[root@stable ~]# curl -s http://stable.nuc:8000/unattended/iPXE?mac=AA:BB:CC:DD:EE:AA | head
#!ipxe

echo Opening global default menu in 15 seconds...
sleep 15

set menu-default discovery
set menu-timeout 5000
set port 8448

This request is treated as an unknown host (a menu appears). All and all, it works for me!

Thulium-Drake · May 19, 2021, 10:39pm

Before going into detail, I’m starting to get the sense we’re barking up the wrong tree

In a nutshell, the problem is not that unknown hosts do no boot the FDI or that known hosts do not boot at all. Unkown and known hosts do boot, be it with a delay.

However, with Katello 4.0 my known host prints an error during iPXE boot I mentioned earlier here:

After this message, the system waits for 30 seconds before continuing the boot process. Which is new, as the Katello 3.18 server did not have this (when I boot a known VM against the 3.18 server it ‘just’ boots right away, without complaining about failing to chainload any network interface and sleeping for 30 seconds)

So, in response to your message:

Yes, I have both versions in VMs on my laptop, so I can easily switch

Intermediate:

#!ipxe
# Intermediate iPXE script to report MAC address to Foreman

:net0
isset ${net0/mac} || goto no_nic
dhcp net0 || goto net1
chain http://foreman.lbhr.htm.lan:8000/unattended/iPXE?mac=${net0/mac} || goto net1

# repeat 31 times

:net32
isset ${net32/mac} || goto no_nic
dhcp net32 || goto net33
chain http://foreman.lbhr.htm.lan:8000/unattended/iPXE?mac=${net32/mac} || goto net33

:net33
goto no_nic

exit 0

:no_nic
echo Failed to chainload from any network interface
sleep 30
exit 1

Local boot:

#!ipxe

# Skips booting from network and continues booting from next device
exit

At first yes, but to exclude any weird issues caused by the upgrade, I’m currently running with a fresh install. Good news is, the symptoms are the same (see above).

Unkown hosts should, and do, boot to the FDI for discovery.

192.168.255.15

192.168.255.151 or 192.168.255.152, the screenshot I linked above is this machine. The IP address differs a bit because I keep deleting it from Foreman’s database.

production.log:

2021-05-19T18:27:09 [I|app|fb74dacf] Started GET "/unattended/iPXE?mac=36%3A6A%3A3D%3A1F%3A7E%3ABC&url=http%3A%2F%2Fforeman.lbhr.htm.lan%3A8000" for 192.168.255.15 at 2021-05-19 18:27:09 -0400
2021-05-19T18:27:09 [I|app|fb74dacf] Processing by UnattendedController#host_template as TEXT
2021-05-19T18:27:09 [I|app|fb74dacf]   Parameters: {"mac"=>"36:6A:3D:1F:7E:BC", "url"=>"http://foreman.lbhr.htm.lan:8000", "kind"=>"iPXE", "unattended"=>{}}
2021-05-19T18:27:09 [I|app|fb74dacf]   Rendering text template
2021-05-19T18:27:09 [I|app|fb74dacf]   Rendered text template (Duration: 0.0ms | Allocations: 4)
2021-05-19T18:27:09 [I|app|fb74dacf] Completed 200 OK in 225ms (Views: 1.7ms | ActiveRecord: 85.2ms | Allocations: 66707)

foreman-proxy.log:

2021-05-19T18:27:09 103183d9 [I] Started GET /unattended/iPXE mac=36:6A:3D:1F:7E:BC
2021-05-19T18:27:09 103183d9 [I] Finished GET /unattended/iPXE with 200 (271.67 ms)

The only time I saw any X-Forwarded-For headers was then I rigged /usr/share/foreman/app/controllers/unattended_controller.rb to print it’s environment. Which showed:

Looking at the results you get from your systems, I’d say the templates render correctly on both our systems and both versions. But for some reason, the Katello 4.0 booted system goes into it’s sleep 30 function before booting the system from the hard drive. Even when booting the template below is a success:

#!ipxe

# Skips booting from network and continues booting from next device
exit

If it helps, maybe we could do something on Google Meet/Jitsi where I share my screen and show you what I mean. But that depends on your timezone I’m in CEST (UTC +2).

Thulium-Drake · May 19, 2021, 10:40pm

I did check this, but still the system waits for 30 seconds before continuing the boot, also see my previous post with all the details @lzap requested