Errata not showing as applicable

James_Evans · January 11, 2017, 4:14pm

I'm running Foreman 1.12.4 with Katello 3.1.0 on RHEL 7 with about 350
hosts running RHEL 5, 6, and 7 attached.

I've noticed that no hosts have applicable errata, and never have. The
content views show thousands of errata, and I can manually dig in and find
updated packages with attached errata that are not installed on the clients.

Googling around for this problem, I found some old references to problems
with candlepin, and when I look I found that I'm getting these messages:

2017-01-11 08:42:03,508 [thread=http-bio-8443-exec-9]
[req=5f702a90-db72-4aa8-9fe8-32d558db15ca, org=my-org] ERROR
org.candlepin.common.exceptions.mappers.CandlepinExceptionMapper - Runtime
Error com.fasterxml.jackson.databind.JsonMappingException: Index: 0, Size:
0 (through reference chain:
java.util.ArrayList[0]->org.candlepin.model.Entitlement["certificates"]->org.hibernate.collection.internal.PersistentSet[0]->org.candlepin.model.EntitlementCertificate["serial"])
at java.util.ArrayList.rangeCheck:653
…
2017-01-11 08:42:34,038 [thread=Thread-0
(HornetQ-client-global-threads-139306825)] [=, org=] ERROR
org.candlepin.audit.AMQPBusPublisher - Unable to send event: Event
[id=null, target=COMPLIANCE, type=CREATED, time=Tue Oct 04 08:57:28 CDT
2016, entity=8ab0bd815787024c01578ffc1d2d1445]
…
2017-01-11 08:42:34,038 [thread=Thread-0
(HornetQ-client-global-threads-139306825)] [=, org=] ERROR
org.hornetq.core.client - HQ214000: Failed to call onMessage
…

Some of the messages suggested looking at the qpid queue to see if events
are backing up:

qpid-stat --ssl-certificate /etc/pki/katello/qpid_client_striped.crt -b
amqps://$(hostname -f):5671 -q | grep katello_event_queue
katello_event_queue
Y 15.9k 15.9k 0 99.8m 99.8m 0
0 2

The number of messages is slowly climbing, and never seems to decrease. In
my test server, I see 1 consumer of the katello_event_queue, but have never
seen one on the production server. The "Listen on candlepin events" task is
in the running state. I've restarted the katello services (and the server)
several times with no difference is results.

Is there any way to kick katello to get the errata mapping working?

james

James_Evans · January 13, 2017, 2:22pm

Does anyone have any ideas on how to debug this problem? I need to perform
audits of our systems for PCI compliance, and was planning on using Foreman
to produce these reports. Right now I'm looking at having to manually
report on each package on all the hosts.

Thanks,

james

James_Evans · January 26, 2017, 2:09pm

Posting my own follow up, I got this issue resolved. At some point during
the build-out of the Foreman server, the file system where MongoDB is
installed ran out of space due to a typo on my end. Once the FS was
extended everything seems to have been fine, but it looks like there was a
persistent problem with the 'Listen on candlepin events' and 'Monitor Event
Queue' tasks. While they were in state running, they ever seemed to do
anything. After digging into the code and not being able to figure out why
they might get hung in a way that persists across reboots, I crossed my
fingers and destroyed the tasks from the foreman-rake console and restarted
foreman-tasks. Both tasks came right back up and started processing
information! It took more than 24 hours for the candlepin events to catch
up with the past 4 months of events, but it eventually did and the queue
size is down to 0. Errata are now showing as applicable, and can be
searched and managed as expected.

It hasn't been running long enough to tell, but I believe the memory leak I
mentioned in another posting may be fixed as well. I no longer seem to have
a dynflow_executor as my top memory consumer on the server.

So, TL;DR, if the mongodb file system fills up, it is possible for the long
running tasks to get in a "stuck" state where they're running, but not
doing any useful work, even across reboots.

Leaving this here in case anyone else runs into the same issues.