Foreman 3.13 - Websockets Invalid SSL CA

Problem:
2024 I put some effort into optimising and automating my Foreman lab platform so I could better tinker and look at options and ideas outside of operational platforms and processes. I took the time time to do some long standing updated, such as moving to EL9, Puppet 8, etc. to get the to the point where I could re-deploy my lab consistently in minutes, with some fantastic community support, I mostly got there. because this took some time I had to renew my wildcard SSL certificate that I use for my lab, and being cheap I changed providers. This caused a few distractions and problems, that again with some solid help, I managed to get resolved. There is one outstanding issue with my new SSL certificate that I am unable to resolve, and being honest, I do not full understand the behaviour I’m seeing.

When I attempt to use foreman to connect to a virtual guest running on a remote Libvirt/KVM hypervisors I get an error from the foreman web interface ā€˜The connection was closed by the browser. please verify that the certificate authority is valid’

Thanks to some patient hand holding from a few users in the Foreman Matrix channel I managed to get a basic understanding of how the websockets process, and this error is from the browser connecting to a dynamically launched websockets sessions from foreman. The problem I face is this certificate chain and authority is valid and can verify in a few ways, and that the websocket process has turned out to be pretty hard to debug. I’ll detail the verification here but put a lot more detail in the additional information

a.) I’m using the same while card certificate, key and CA with the foreman web interface which is verifying fine - the certs look like this

server_ssl_chain: "/etc/pki/tls/certs/gs_ca_bundle.crt"
  server_ssl_cert: "/etc/pki/tls/certs/star.no-dns.co.uk.crt"
  server_ssl_key: "/etc/pki/tls/private/star.no-dns.co.uk.key"

And as you can see the foreman web interface is happy with this

b.) I can verify the certificate chain with openssl

[root@jarvis certs]# openssl verify -CAfile gs_ca_bundle.crt star.no-dns.co.uk.crt 
star.no-dns.co.uk.crt: OK

I’m using the same files for the websocket handhake

  websockets_ssl_key: "/etc/pki/tls/private/star.no-dns.co.uk.key"
  websockets_ssl_cert: "/etc/pki/tls/certs/star.no-dns.co.uk.crt"

Expected outcome:
foreman to open a virtual console to the remote guest

Foreman and Proxy versions:
foreman-3.12.1-1.el9.noarch
foreman-proxy-3.12.1-1.el9.noarch

Foreman and Proxy plugin versions:
foreman-libvirt-3.12.1-1.el9.noarch
rubygem-smart_proxy_dynflow-0.9.3-1.fm3_12.el9.noarch
rubygem-smart_proxy_ansible-3.5.6-1.fm3_12.el9.noarch
rubygem-smart_proxy_remote_execution_ssh-0.11.4-1.fm3_12.el9.noarch

Distribution and version:
Rocky Linux 9.5

Other relevant data:

In debugging this problem I’ve taken the following action after verifying the SSL certificate chain and authority, and made the following observations.

The websockets configuration options only allow for a SSL certificate and key.
From reading the websockify documentation the summary was unless you specify a CA in the arguments it defaults to the system CA bundle. While this shouldn’t be needed to remove doubt, I dropped the CA file (used in the openSSL verification example) into

/etc/pki/ca-trust/source/anchors/

and updated the CA Trust on the host

update-ca-trust

This should not have been needed but just trying to remove doubt/possibility.
This had no impact.

to further remove possibility I updated line 44 - 48 in

/usr/share/foreman/lib/ws_proxy.rb

to look like

cmd  = "websockify --daemon --idle-timeout=#{idle_timeout} --timeout=#{timeout} #{port} #{host}:#{host_port}"
      cmd += " --ssl-target" if ssl_target
      if Setting[:websockets_encrypt]
        cmd += " --cert #{Setting[:websockets_ssl_cert]} --cafile /etc/pki/tls/certs/gs_ca_bundle.crt --verify-client" if Setting[:websockets_ssl_cert]
        cmd += " --key #{Setting[:websockets_ssl_key]}" if Setting[:websockets_ssl_key]
      end

this puts the CA file in the websockify process launched by foreman and forces it to use it with the verify-client argument

This again had no impact, so I reverted the change.

Having been unable to get any meaningful logs or output from the developer console in the web browser, I moved to trying to launch the websockify process manually to try to get some better output, but stumbled across what seems odd behaviour.

I used postman on a local machine to just do a basic websock connection on wss (secure) protocol

launching a virtual console via the foreman web interface I see the following process launch on the foreman host, which is what the browser console attempts to connect to.

foreman    49025       1 15 10:11 ?        00:00:00 /usr/bin/python3 /usr/bin/websockify --daemon --idle-timeout=120 --timeout=120 5930 lcars.no-dns.co.uk:5901 --cert /etc/pki/tls/certs/star.no-dns.co.uk.crt --key /etc/pki/tls/private/star.no-dns.co.uk.key

postman fails to connect to this, mirroring the browser behaviour

As you can see from the process, it’s using the same certs and times out and shuts down after 2 minutes.

However if change /etc/passwd and assign /bin/bash to the foreman user, (I know I could have done this with sudo - but I wanted to assume the full environment the same as foreman) I can launch the same exact same command as foreman

[foreman@jarvis ~]$ /usr/bin/python3 /usr/bin/websockify --daemon --idle-timeout=120 --timeout=120 5930 lcars.no-dns.co.uk:5901 --cert /etc/pki/tls/certs/star.no-dns.co.uk.crt --key /etc/pki/tls/private/star.no-dns.co.uk.key
/usr/lib/python3.9/site-packages/websockify/websocket.py:31: UserWarning: no 'numpy' module, HyBi protocol will be slower
  warnings.warn("no 'numpy' module, HyBi protocol will be slower")
WebSocket server settings:
  - Listen on :5930
  - SSL/TLS support
  - Backgrounding (daemon)

which launches exactly the same as when foreman launched the session

foreman    49126       1  0 10:16 ?        00:00:00 /usr/bin/python3 /usr/bin/websockify --daemon --idle-timeout=120 --timeout=120 5930 lcars.no-dns.co.uk:5901 --cert /etc/pki/tls/certs/star.no-dns.co.uk.crt --key /etc/pki/tls/private/star.no-dns.co.uk.key

the difference this time, is that postman will correctly connect to this process

which is really confusing me, and gives me 2 questions which I cannot answer and at this moment in time can’t work out how to get an answer to.

a.) why does the foreman think the SSL CA is invalid when it validates
b.) why does a foreman launched process fail, but the same process manually launched work

From the help I’ve already recieved this doesn’t appear to be a foreman problem, but from the debugging I’ve done and listed here it’s hard to not think that foreman is playing a part in this, because outside of foreman the certificate chain is fine, and outside of foreman the exact same websockets process can validate a secure connection

I’m just bumping this thread as despite trying additional debugging and doing a clear foreman deployment the problem persists. At the moment I think the real question I’d like to answer is what is different about the way foreman is launching the web sockets connection as I think the SSL verification is a red herring.

Not being knowledgeable on web sockets any debugging approaches would be very welcome and appreciated

going to give this a bump, see if I can get any input, as I’m still out of ideas on how to even progress debugging it at this stage.

This is just a shot in the dark, as I am not very familiar with the whole websocket stack myself, but have you checked for SELinux violations when you try launching the websocket process from Foreman? Maybe there is a wrong/missing context set on you certificates or some other file that prevents Foreman from properly starting the websocket process.
I have not idea how websockify behaves when it cannot access cert files, and I would expect it to crash with an error message, but the main thing that comes to my mind when I hear ā€œit works when I do it manually, but it fails when the service does itā€ is SELinux contexts. Interfactive shell sessions usually run as unconfined_t, while service processes (including Foreman) usually run with their own SELinux type confining what they can an cannot do/access. So personally I would search in that direction.

it’s a great suggestions, sadly, it’s one of the things that’s been validated, first thing was to move to permissive mode and also verify the audit log, but no, not SELinux related.

got to bump this as I’m still no further along

I’ve been unable to replicate this outside of my lab as all my ā€˜non-lab’ instances use different CA/chains and they run on a different domain (as it’s not my little home lab) so can’t just import the certs to test.

I’m trying to prep a rebuild of my lab in line with the shift to openvox so I’d like to try to solve this before then.

I’ve done a little more testing on this, and while I’m no closer to actually understanding of it, I can say with a lot more confidence this is something to do with the way foreman is launching the process for the websockets

All the obvious stuff around SELinux, foreman user permissions etc I’ve managed to discount, what I’m really looking for is a little bit of insight in how I can trigger some debugging of the code that actually launched, as I’ve managed to confirm now mulitple times, if I su to the foreman user, and launch the exact command foreman uses it works, so there must be some context/wrap around how that command is being launched by the foreman application. The actual launch command itself is basic, but the environment inside the foreman application is probably what I’m failing to replicate when I su to the forman user.

does anyone have a basic understanding of the environment wrap that foreman will create before it launches this process, as while I have no evidence for this(because I’m struggling to debug) I can only logically assume that’s the problem as manually triggering the exact same command works fine.

Not sure if this will solve your issue, but I concatenated my server cert and ca chain file into one file and changed foreman to use that file for websockets, that resolved my issues.

cd to /etc/puppetlabs/puppet/ssl/certs/
cat server.example.pem | sudo tee server-chain.pem
cat ca.pem | sudo tee --append server-chain.pem

then edit /etc/foreman/settings.yaml

:websockets_ssl_cert: /etc/puppetlabs/puppet/ssl/certs/server-chain.pem

finally, restart foreman: sudo systemctl restart foreman

Hope that helps

1 Like

I ā€˜thought’ I’d already tried this, I’m happy to try it again though as it’s possible I’ve missed it.

@ikonia Were you ever able to figure this out? We’re running into the same (or very similar) issue and the suggested fix here didn’t work for us.

no, I’m really struggling with it, I’m doing a few generic fixes to the lab this lives in and I’m going to do a full clean deploy, different hostname etc etc to remove any hang over of this host, but this one really has me stumped.

It makes it so much easier, if you posted the exact connect of the chain file, in particular if it contains the root ca certificate or not. Or post at least the output of

$ openssl storeutl -noout -text /etc/pki/tls/certs/gs_ca_bundle.crt | grep '\(Subject\|Issuer\):'

This verifies the certificate against the chain file including the system trust store. My guess is that

# openssl verify -no-CAstore -no-CApath  -CAfile gs_ca_bundle.crt star.no-dns.co.uk.crt 

will fail because you did not include the root ca.

My first guess would always be that you are using a different truststore and most likely your chain is missing the root ca.

The chain file is not only use to provide the server certificate to the browser which has the standard root ca trust store, but also to verify internal communications. For internal communications, it uses only the ca file but no trust store as internally it only wants and needs to connect to a server using that exact chain. That’s why you would see no problems using a browser or a simple openssl verify while it doesn’t work in other cases where the truststore is not used.

But again, that’s just a guess based on the information you gave.

1 Like

really useful suggestions and considerations

The cert chain, as you suggested currently looks like this

Issuer: C=GB, O=Sectigo Limited, CN=Sectigo Public Server Authentication Root R46
Subject: C=GB, O=Sectigo Limited, CN=Sectigo Public Server Authentication CA DV R36
Issuer: C=US, ST=New Jersey, L=Jersey City, O=The USERTRUST Network, CN=USERTrust RSA Certification Authority
Subject: C=GB, O=Sectigo Limited, CN=Sectigo Public Server Authentication Root R46
Issuer: C=US, ST=New Jersey, L=Jersey City, O=The USERTRUST Network, CN=USERTrust RSA Certification Authority
Subject: C=US, ST=New Jersey, L=Jersey City, O=The USERTRUST Network, CN=USERTrust RSA Certification Authority

I get what you’re saying about not including the trust store in the verify to better validate this, and I half expected this to fail as you suggest, but it worked

[root@lcars certs]# openssl verify -no-CAstore -no-CApath  -CAfile gs_ca_bundle.crt star.no-dns.co.uk.crt
star.no-dns.co.uk.crt: OK

I get the logic of your thinking, but from what I managed to understand from how the websock process works (lot of room for error in my understanding) is that the only SSL verification is from the browser to the websocket session, once it hits the websock session there is no TLS, so I’m not sure where a truststore would be needed beyond the browser, but I accept I’m not 100% confident in the websocket console process.

O.K. Good. It’s better to check. With all the JavaScripts floating around, you’ll never know what is used where exactly.

I am not sure what the websockets-ssl-cert and websockets-ssl-key options do exactly. I only have a katello server but on that both are empty (UNDEF in foreman-installer), while websockets_encrypt is true. And I don’t think I use websockets anyway.

Either way, try resetting the websocket cert and key and run foreman-installer again. Possibly, it generates something from the existing server cert automatically.

Otherwise, as the websockets configuration in foreman has no option to configure the ca, you could put the server certificate including the full chain into a file and use that as websockets_ssl_cert. Maybe that helps.

Did you check the browser’s devtools, in particular the network connections to the websockets? If the browser rejects the connection and passes that to the scripts running, the devtools could give you some more insight.

1 Like

the suggestion is really appreciated, as you say, a lot going on its easy to miss or not fully understand the process.

the websockets-ssl-cert and websockets-ssl-key args are what you point the installer at to tell the websockify process to use to ā€˜present’ to your browser when it tries tot connect. I set these to a valid certificate chain file/key so that when the browser connects to it it gets a public signed CA certificated and is trusted and doesn’t error/present a warning

If you don’t set anything the browser will give you a self signed warning when you try to connect to the virtual console (via the websockfiy process).

if you look at the first post, I hacked the websockify process to pass in the full certificate chain which didn’t impact it.

the bit that’s confusing me is, it’s the same cert chain foreman is using, so if the browser is ā€˜happy’ with this certificate chain when I visit https://foreman-url why is it unhappy with the same certificate chain when my browser hits an app hosting the same certificate chain (and critically why is it happy with it when I manually launch exactly the same command as foreman on the same node)

Debugging websockets is a lot harder than I thought (see the postman tests I did in the first post) the browsers dev tools so far have given me very little feedback

I would expect devtools/network to show you the websockets connection which gets refused, and possibly even a message on the console helping to locate the exact code line in the script…

But I must admit, as I am not using any of that thus I cannot really help you much more here. I don’t even have websockify on my katello server…

I think the websockify process is part of the libvirt plugin as other compute providers such as vmware for example use a different approach, that’s maybe why it’s not there in your katello install.

the devtools do give a connect refused, but no meaningful output beyond that, it’s one of the reasons this has been so hard to debug and get an understanding of what’s happening.

as always though, solid suggestions, so your input is appreciated