[Katello 2.2] Clients bound to capsule had goferd process error when katello unreacheable

Marc_Farrow1 · September 11, 2015, 10:17am

Hi,

This may or may not be an actual issue, but here goes. We run Katello 2.2.1
and have a capsule for a remote site.

This Saturday we had an extended network outage at a site where our main
Katello server is located. The capsule is used for all the remote clients
at that site to bind to. They have the Cert installed from the Capsule.

During the time where Katello was unreachable all the clients that are
bound to the capsule all timed out and the goferd process starting to
consume more and more memory ( I believe there's a documented memory leak
for the proton package during this scenario?)

These were the types of errors we received:

Sep 5 19:47:37 CLIENT1 goferd:
[ERROR][pulp.agent.85d5dd13-5170-4b2b-aeb6-c9c4d6582be7]
gofer.messaging.adapter.model:43 - maximum recursion depth exceeded while
calling a Python object
Sep 5 19:47:37 CLIENT1 goferd:
[ERROR][pulp.agent.85d5dd13-5170-4b2b-aeb6-c9c4d6582be7]
gofer.messaging.adapter.model:43 - Traceback (most recent calllast):
Sep 5 19:47:37 CLIENT1 goferd:
[ERROR][pulp.agent.85d5dd13-5170-4b2b-aeb6-c9c4d6582be7]
gofer.messaging.adapter.model:43 - File
"/usr/lib/python2.6/site-packages/gofer/messaging/adapter/model.py", line
39, in _fnSep 5 19:47:37 PHX1-CSN-CACV01 goferd:
[ERROR][pulp.agent.85d5dd13-5170-4b2b-aeb6-c9c4d6582be7]
gofer.messaging.adapter.model:43 - return fn(*args, **keywords)
Sep 5 19:47:37 CLIENT1 goferd:
[ERROR][pulp.agent.85d5dd13-5170-4b2b-aeb6-c9c4d6582be7]
gofer.messaging.adapter.model:43 - File
"/usr/lib/python2.6/site-packages/gofer/messaging/adapter/model.py", line
599, in get
Sep 5 19:47:37 CLIENT1 goferd:
[ERROR][pulp.agent.85d5dd13-5170-4b2b-aeb6-c9c4d6582be7]
gofer.messaging.adapter.model:43 - return self._impl.get(timeout)

I can see similar messages such as the following in /var/log/httpd/

katello-reverse-proxy_error_ssl.log-20150906:[Sat Sep 05 17:52:03 2015]
[error] (113)No route to host: proxy: HTTPS: attempt to connect to
IP_OF_KATELLO:443 (HOSTNAME_OF_KATELLO) failed
katello-reverse-proxy_error_ssl.log-20150906:[Sat Sep 05 18:07:59 2015]
[error] (110)Connection timed out: proxy: HTTPS: attempt to connect to
P_OF_KATELLO:443 (HOSTNAME_OF_KATELLO) failed
katello-reverse-proxy_error_ssl.log-20150906:[Sat Sep 05 18:23:00 2015]
[error] (110)Connection timed out: proxy: HTTPS: attempt to connect to
P_OF_KATELLO:443 (HOSTNAME_OF_KATELLO) failed

Apart from that I can't really see too much on the capsule server itself
that indicated a service outage on the capsule. Checking netstat the
clients only show connections established to the capsule and not the main
katello server.

Any help in understanding why the clients that bind to the capsule might
have been disconnected during the time the main katello server was
unreachable would be great.

Client packages are:

gofer.noarch 2.5.3-1.el6 @katello-client
katello-agent.noarch 2.2.1-1.el6 @katello-client
katello-client-repos.noarch 2.2.1-1.el6
@/katello-client-repos-latest
pulp-rpm-handlers.noarch 2.6.0-1.el6 @katello-client
python-gofer.noarch 2.5.3-1.el6 @katello-client
python-gofer-proton.noarch 2.5.3-1.el6 @katello-client
python-isodate.noarch 0.5.0-4.pulp.el6 @katello-client
python-pulp-agent-lib.noarch 2.6.0-1.el6 @katello-client
python-pulp-common.noarch 2.6.0-1.el6 @katello-client
python-pulp-rpm-common.noarch 2.6.0-1.el6 @katello-client
python-qpid-proton.x86_64 0.9-2.el6 @katello-client
qpid-proton-c.x86_64 0.9-2.el6 @katello-client

Thanks

Marc