Problem:
A small minority of our registered Katello nodes stop checking in, as evidenced by the “Last Checkin” column under Content Hosts in Katello.
Yum/DNF continue working as expected on the affected nodes.
Expected outcome:
All registered nodes continue to checkin successfully. Foreman and Proxy versions:
foreman-3.5.3-1.el8.noarch Foreman and Proxy plugin versions:
katello-4.7.5-1.el8.noarch Distribution and version:
Katello server is Oracle Linux 8 Other relevant data:
Affected nodes appear to all be CentOS 7, example RPMs on the nodes:
As I understand it, the checkins are being done by the rhsmcertd service, whose logs read something like:
Tue Jun 27 05:56:20 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jun 27 06:36:22 2023 [WARN] (Auto-attach) Update failed (1), retry will occur on next run.
Tue Jun 27 09:56:22 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
Tue Jun 27 13:56:21 2023 [WARN] (Cert Check) Update failed (1), retry will occur on next run.
I checked with some folks on the team and not exactly sure what happened here. As long as rhsmcertd is enabled and running, you should be fine. Are there any corresponding production logs or tracebacks in rhsm.log that could help debug this?