Problem:
2024 I put some effort into optimising and automating my Foreman lab platform so I could better tinker and look at options and ideas outside of operational platforms and processes. I took the time time to do some long standing updated, such as moving to EL9, Puppet 8, etc. to get the to the point where I could re-deploy my lab consistently in minutes, with some fantastic community support, I mostly got there. because this took some time I had to renew my wildcard SSL certificate that I use for my lab, and being cheap I changed providers. This caused a few distractions and problems, that again with some solid help, I managed to get resolved. There is one outstanding issue with my new SSL certificate that I am unable to resolve, and being honest, I do not full understand the behaviour I’m seeing.
When I attempt to use foreman to connect to a virtual guest running on a remote Libvirt/KVM hypervisors I get an error from the foreman web interface ‘The connection was closed by the browser. please verify that the certificate authority is valid’
Thanks to some patient hand holding from a few users in the Foreman Matrix channel I managed to get a basic understanding of how the websockets process, and this error is from the browser connecting to a dynamically launched websockets sessions from foreman. The problem I face is this certificate chain and authority is valid and can verify in a few ways, and that the websocket process has turned out to be pretty hard to debug. I’ll detail the verification here but put a lot more detail in the additional information
a.) I’m using the same while card certificate, key and CA with the foreman web interface which is verifying fine - the certs look like this
server_ssl_chain: "/etc/pki/tls/certs/gs_ca_bundle.crt"
server_ssl_cert: "/etc/pki/tls/certs/star.no-dns.co.uk.crt"
server_ssl_key: "/etc/pki/tls/private/star.no-dns.co.uk.key"
And as you can see the foreman web interface is happy with this
b.) I can verify the certificate chain with openssl
[root@jarvis certs]# openssl verify -CAfile gs_ca_bundle.crt star.no-dns.co.uk.crt
star.no-dns.co.uk.crt: OK
I’m using the same files for the websocket handhake
websockets_ssl_key: "/etc/pki/tls/private/star.no-dns.co.uk.key"
websockets_ssl_cert: "/etc/pki/tls/certs/star.no-dns.co.uk.crt"
Expected outcome:
foreman to open a virtual console to the remote guest
Foreman and Proxy versions:
foreman-3.12.1-1.el9.noarch
foreman-proxy-3.12.1-1.el9.noarch
Foreman and Proxy plugin versions:
foreman-libvirt-3.12.1-1.el9.noarch
rubygem-smart_proxy_dynflow-0.9.3-1.fm3_12.el9.noarch
rubygem-smart_proxy_ansible-3.5.6-1.fm3_12.el9.noarch
rubygem-smart_proxy_remote_execution_ssh-0.11.4-1.fm3_12.el9.noarch
Distribution and version:
Rocky Linux 9.5
Other relevant data:
In debugging this problem I’ve taken the following action after verifying the SSL certificate chain and authority, and made the following observations.
The websockets configuration options only allow for a SSL certificate and key.
From reading the websockify documentation the summary was unless you specify a CA in the arguments it defaults to the system CA bundle. While this shouldn’t be needed to remove doubt, I dropped the CA file (used in the openSSL verification example) into
/etc/pki/ca-trust/source/anchors/
and updated the CA Trust on the host
update-ca-trust
This should not have been needed but just trying to remove doubt/possibility.
This had no impact.
to further remove possibility I updated line 44 - 48 in
/usr/share/foreman/lib/ws_proxy.rb
to look like
cmd = "websockify --daemon --idle-timeout=#{idle_timeout} --timeout=#{timeout} #{port} #{host}:#{host_port}"
cmd += " --ssl-target" if ssl_target
if Setting[:websockets_encrypt]
cmd += " --cert #{Setting[:websockets_ssl_cert]} --cafile /etc/pki/tls/certs/gs_ca_bundle.crt --verify-client" if Setting[:websockets_ssl_cert]
cmd += " --key #{Setting[:websockets_ssl_key]}" if Setting[:websockets_ssl_key]
end
this puts the CA file in the websockify process launched by foreman and forces it to use it with the verify-client argument
This again had no impact, so I reverted the change.
Having been unable to get any meaningful logs or output from the developer console in the web browser, I moved to trying to launch the websockify process manually to try to get some better output, but stumbled across what seems odd behaviour.
I used postman on a local machine to just do a basic websock connection on wss (secure) protocol
launching a virtual console via the foreman web interface I see the following process launch on the foreman host, which is what the browser console attempts to connect to.
foreman 49025 1 15 10:11 ? 00:00:00 /usr/bin/python3 /usr/bin/websockify --daemon --idle-timeout=120 --timeout=120 5930 lcars.no-dns.co.uk:5901 --cert /etc/pki/tls/certs/star.no-dns.co.uk.crt --key /etc/pki/tls/private/star.no-dns.co.uk.key
postman fails to connect to this, mirroring the browser behaviour
As you can see from the process, it’s using the same certs and times out and shuts down after 2 minutes.
However if change /etc/passwd and assign /bin/bash to the foreman user, (I know I could have done this with sudo - but I wanted to assume the full environment the same as foreman) I can launch the same exact same command as foreman
[foreman@jarvis ~]$ /usr/bin/python3 /usr/bin/websockify --daemon --idle-timeout=120 --timeout=120 5930 lcars.no-dns.co.uk:5901 --cert /etc/pki/tls/certs/star.no-dns.co.uk.crt --key /etc/pki/tls/private/star.no-dns.co.uk.key
/usr/lib/python3.9/site-packages/websockify/websocket.py:31: UserWarning: no 'numpy' module, HyBi protocol will be slower
warnings.warn("no 'numpy' module, HyBi protocol will be slower")
WebSocket server settings:
- Listen on :5930
- SSL/TLS support
- Backgrounding (daemon)
which launches exactly the same as when foreman launched the session
foreman 49126 1 0 10:16 ? 00:00:00 /usr/bin/python3 /usr/bin/websockify --daemon --idle-timeout=120 --timeout=120 5930 lcars.no-dns.co.uk:5901 --cert /etc/pki/tls/certs/star.no-dns.co.uk.crt --key /etc/pki/tls/private/star.no-dns.co.uk.key
the difference this time, is that postman will correctly connect to this process
which is really confusing me, and gives me 2 questions which I cannot answer and at this moment in time can’t work out how to get an answer to.
a.) why does the foreman think the SSL CA is invalid when it validates
b.) why does a foreman launched process fail, but the same process manually launched work
From the help I’ve already recieved this doesn’t appear to be a foreman problem, but from the debugging I’ve done and listed here it’s hard to not think that foreman is playing a part in this, because outside of foreman the certificate chain is fine, and outside of foreman the exact same websockets process can validate a secure connection