Katello node checkin issues

Problem:
A small minority of our registered Katello nodes stop checking in, as evidenced by the “Last Checkin” column under Content Hosts in Katello.

Yum/DNF continue working as expected on the affected nodes.

Expected outcome:
All registered nodes continue to checkin successfully.
Foreman and Proxy versions:
foreman-3.5.3-1.el8.noarch
Foreman and Proxy plugin versions:
katello-4.7.5-1.el8.noarch
Distribution and version:
Katello server is Oracle Linux 8
Other relevant data:
Affected nodes appear to all be CentOS 7, example RPMs on the nodes:

subscription-manager-rhsm-certificates-1.24.51-1.el7.centos.x86_64
subscription-manager-rhsm-1.24.51-1.el7.centos.x86_64
subscription-manager-1.24.51-1.el7.centos.x86_64
katello-host-tools-3.5.7-5.el7.noarch
katello-host-tools-tracer-3.5.7-5.el7.noarch

As I understand it, the checkins are being done by the rhsmcertd service, whose logs read something like:

Tue Jun 27 05:56:20 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jun 27 06:36:22 2023 [WARN] (Auto-attach) Update failed (1), retry will occur on next run.
Tue Jun 27 09:56:22 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jun 27 13:56:21 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.

One update, after running a “yum upgrade” (with no packages affected) and restarting rhsmcertd, an example node checked in…

# yum upgrade
Loaded plugins: enabled_repos_upload, fastestmirror, package_upload, product-id, search-disabled-repos, subscription-manager, tracer_upload
1 local certificate has been deleted.

Is this pertinent?

I checked with some folks on the team and not exactly sure what happened here. As long as rhsmcertd is enabled and running, you should be fine. Are there any corresponding production logs or tracebacks in rhsm.log that could help debug this?

Not that I can see, rhsm.log looks only like this:

2023-06-27 03:10:02,999 [INFO] rhsmd:88171:MainThread @rhsm_d.py:382 - D-Bus API: com.redhat.SubscriptionManager provided by rhsmd is deprecated
2023-06-27 03:10:02,999 [INFO] rhsmd:88171:MainThread @rhsm_d.py:383 - Consider using D-Bus API: com.redhat.RHSM1 provided by rhsm.service
2023-06-27 03:10:03,002 [INFO] rhsmd:88171:MainThread @connection.py:915 - Connection built: host=katello.example.com port=443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ insecure=False

I have the same issue with CentOS 7 client and foreman 3.1.3 with katello-4.3.1

I also use the same subscription-manager version subscription-manager-1.24.51-1

Here is the log entry from /var/log/rhsm/rhsmcertd.log:

Tue Jul 11 09:48:27 2023 [ERROR] unable to get lock, exiting
Tue Jul 11 09:49:26 2023 [INFO] rhsmcertd is shutting down...
Tue Jul 11 09:49:26 2023 [INFO] Starting rhsmcertd...
Tue Jul 11 09:49:26 2023 [INFO] Auto-attach interval: 1440.0 minutes [86400 seconds]
Tue Jul 11 09:49:26 2023 [INFO] Cert check interval: 5.0 minutes [300 seconds]
Tue Jul 11 09:49:26 2023 [INFO] Waiting 2.0 minutes plus 58433 splay seconds [58553 seconds total] before performing first auto-attach.
Tue Jul 11 09:49:26 2023 [INFO] Waiting 2.0 minutes plus 264 splay seconds [384 seconds total] before performing first cert check.
Tue Jul 11 09:55:51 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:00:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:05:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:10:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:15:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:20:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:25:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jul 11 10:30:52 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.

in one of the articles I saw it was recommended to use the latest version of subscription-manager. I couldn’t test this myself though.

I had the same issue and after a few days of troubleshooting I found a solution that worked for me.

I did a strace on the rhsmcertd process

strace -e trace=open,write -f -p <pid> -o strace.log

And it told me, that a required module is missing

write(2, "    ", 4)               = 4
write(2, "import requests\n", 16) = 16
close(4)                          = 0
munmap(0x7fe7b848f000, 4096)      = 0
write(2, "ImportError", 11)       = 11
write(2, ": ", 2)                 = 2
write(2, "No module named requests", 24) = 24
write(2, "\n", 1)                 = 1

You can easily install it with

yum install python-requests.noarch 

And after restarting the rhsmcertd.service everything worked as intended. It seems like this problem only occurs on CentOS 7 and not RHEL 7 etc.