Goferd errors in /var/log/messages

Problem: goferd agent has a memory problem and it throws an error.

Expected outcome:

Foreman and Proxy versions: Foreman 1.21.0 , Katello 3.11.0

Foreman and Proxy plugin versions:

bastion 6.1.16
foreman-tasks 0.14.5
foreman_ansible 2.3.2
foreman_docker 4.1.0
foreman_remote_execution 1.70
foreman_snapshot_management 1.5.1
foreman_vmwareannotations 0.0.1
foreman_wreckingball 3.3.0
katello 3.11.0

Other relevant data:
Hello everybody,
I am getting following error in some (not all) of the machines in our environment. dev10 is an example VM which is registered on katelloproxy01.

/var/log/messages:
Jun 14 12:47:17 dev10 goferd: [INFO][worker-0] gofer.messaging.adapter.proton.connection:131 - closed: proton+amqps://katelloproxy01.int.xyz.loc:5647
Jun 14 12:47:17 dev10 goferd: [INFO][worker-0] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://katelloproxy01.int.xyz.loc:5647
Jun 14 12:47:17 dev10 goferd: [INFO][worker-0] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://katelloproxy01.int.xyz.loc:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Jun 14 12:47:17 dev10 goferd: [INFO][worker-0] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://katelloproxy01.int.xyz.loc:5647
Jun 14 12:47:17 dev10 goferd: [INFO][worker-0] gofer.messaging.adapter.connect:30 - connected: proton+amqps://katelloproxy01.int.xyz.loc:5647
Jun 14 12:47:17 dev10 goferd: [ERROR][worker-0] gofer.messaging.adapter.proton.reliability:48 - receiver 31cb6714-57df-4d65-868a-6c71e1ec8ce0 from None closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node')

I made sure that there is connection between the VM and the katelloproxy01 on 5647/tcp.

These following packages are installed on the dev10 machine:

gofer-2.12.1-1.el7.noarch
katello-agent-3.4.2-2.el7.noarch
katello-ca-consumer-katelloproxy01.int.nbg6.sh.loc-1.0-1.noarch
katello-host-tools-3.4.2-2.el7.noarch
katello-host-tools-fact-plugin-3.4.2-2.el7.noarch
python-gofer-2.12.1-1.el7.noarch
python-gofer-proton-2.12.1-1.el7.noarch
python2-qpid-proton-0.28.0-1.el7.x86_64
qpid-proton-c-0.28.0-1.el7.x86_64

And /var/log/messages file in katelloproxy01 machine contains following lines:

Jun 14 12:47:17 katelloproxy01 qdrouterd: 2019-06-14 12:47:17.651416 +0200 SERVER (info) Connection from 10.215.0.23:34160 (to :5647) failed: proton:io Connection reset by peer - disconnected :5672 (SSL Failure: Unknown error)
Jun 14 12:47:17 katelloproxy01 qdrouterd: 2019-06-14 12:47:17.674311 +0200 SERVER (info) Accepted connection to :5647 from 10.215.0.23:34212

In short, what causes these SSL failures and how can I fix this?

Best regards,
Bora AKAYDIN

Hi,

I have invested these errors quite a bit in the past.
I am not sure anymore what exactly the cause of the error was, but in short: The error message is misleading. The connection is terminated for some reason and then re-established. There is no real SSL failure.
In our case, qpid could not handle enough clients at once and that caused regular session termination iirc.
Since this was the only helpful document I could find, here is the official tuning guide for RedHat Satellite that should also be applicable to Katello. Tuning our Katello installation according to this Guide solved the problem for us.

I hope this helps.
Regards

Hi,
I thank you for your answer. I read the documentation and I do not think that it is related with my issue. In the document it was mentioned about 10K hosts however our environment is not that big. There are 110 content hosts, distributed to 4 smart proxies and most crowded smart proxy has around 40 hosts.

Although I have increased open files limits of httpd, qpid and qdrouterd, this should not be the case. Additionally, there are “always failing” hosts and “sometimes failing” hosts. IMO, if I really would hit a limit, then every host should fail, I suppose. Let me show you two examples:

Best regards,
Bora AKAYDIN

[root@katelloproxy01 ~]# cat /var/log/messages | grep 10.215.0.23
Jun 17 11:19:39 katelloproxy01 qdrouterd: 2019-06-17 11:19:39.175551 +0200 SERVER (info) Accepted connection to :5647 from 10.215.0.23:54294
Jun 17 11:19:49 katelloproxy01 qdrouterd: 2019-06-17 11:19:49.216378 +0200 SERVER (info) Connection from 10.215.0.23:54294 (to :5647) failed: proton:io Connection reset by peer - disconnected :5672 (SSL Failure: Unknown error)
Jun 17 11:19:49 katelloproxy01 qdrouterd: 2019-06-17 11:19:49.239727 +0200 SERVER (info) Accepted connection to :5647 from 10.215.0.23:54346
Jun 17 11:19:59 katelloproxy01 qdrouterd: 2019-06-17 11:19:59.270645 +0200 SERVER (info) Connection from 10.215.0.23:54346 (to :5647) failed: proton:io Connection reset by peer - disconnected :5672 (SSL Failure: Unknown error)
Jun 17 11:19:59 katelloproxy01 qdrouterd: 2019-06-17 11:19:59.293290 +0200 SERVER (info) Accepted connection to :5647 from 10.215.0.23:54396
Jun 17 11:20:09 katelloproxy01 qdrouterd: 2019-06-17 11:20:09.318674 +0200 SERVER (info) Connection from 10.215.0.23:54396 (to :5647) failed: proton:io Connection reset by peer - disconnected :5672 (SSL Failure: Unknown error)
Jun 17 11:20:09 katelloproxy01 qdrouterd: 2019-06-17 11:20:09.342281 +0200 SERVER (info) Accepted connection to :5647 from 10.215.0.23:54468
# This continues infinitely...
[root@katelloproxy01 ~]# cat /var/log/messages | grep 10.210.34.10
...
Jun 17 11:22:40 katelloproxy01 qdrouterd: 2019-06-17 11:22:40.173535 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35380
Jun 17 11:22:50 katelloproxy01 qdrouterd: 2019-06-17 11:22:50.209939 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35384
Jun 17 11:23:00 katelloproxy01 qdrouterd: 2019-06-17 11:23:00.236463 +0200 SERVER (info) Connection from 10.210.34.10:35384 (to :5647) failed: proton:io Connection reset by peer - on write to :5672 (SSL Failure: Unknown error)
Jun 17 11:23:00 katelloproxy01 qdrouterd: 2019-06-17 11:23:00.241953 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35388
Jun 17 11:23:10 katelloproxy01 qdrouterd: 2019-06-17 11:23:10.269423 +0200 SERVER (info) Connection from 10.210.34.10:35388 (to :5647) failed: proton:io Connection reset by peer - on write to :5672 (SSL Failure: Unknown error)
Jun 17 11:23:10 katelloproxy01 qdrouterd: 2019-06-17 11:23:10.275612 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35400
Jun 17 11:23:20 katelloproxy01 qdrouterd: 2019-06-17 11:23:20.311764 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35406
Jun 17 11:23:30 katelloproxy01 qdrouterd: 2019-06-17 11:23:30.343997 +0200 SERVER (info) Connection from 10.210.34.10:35406 (to :5647) failed: proton:io Connection reset by peer - on write to :5672 (SSL Failure: Unknown error)
Jun 17 11:23:30 katelloproxy01 qdrouterd: 2019-06-17 11:23:30.350278 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35424
Jun 17 11:23:40 katelloproxy01 qdrouterd: 2019-06-17 11:23:40.388819 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35426
Jun 17 11:23:50 katelloproxy01 qdrouterd: 2019-06-17 11:23:50.417610 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35430
Jun 17 11:24:00 katelloproxy01 qdrouterd: 2019-06-17 11:24:00.454001 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35436
Jun 17 11:24:10 katelloproxy01 qdrouterd: 2019-06-17 11:24:10.486406 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35438
Jun 17 11:24:20 katelloproxy01 qdrouterd: 2019-06-17 11:24:20.519796 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35442
Jun 17 11:24:30 katelloproxy01 qdrouterd: 2019-06-17 11:24:30.553392 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35446
Jun 17 11:24:40 katelloproxy01 qdrouterd: 2019-06-17 11:24:40.589557 +0200 SERVER (info) Accepted connection to :5647 from 10.210.34.10:35450
...

Hi,
the error does indeed look very much like the one we have had.
Though, I do agree that 110 content hosts should not be a problem with the default settings. I a afaraid I’m not able to help you any more than that. :frowning:

Regards

Hi,
I guess, I have found the solution. Our infrastructure looks like below:

foreman-server
├── katelloproxy00
├── katelloproxy01
├── katelloproxy02
└── katelloproxy03

There are 4 smart proxies for each subnet. I was getting errors from the VMs in those subnets. I have noticed port 5646/tcp on foreman-server was unavailable to the proxies. I have fixed it in Friday and enabled goferd in two machines. I had no errors since Friday. Today, I have enabled goferd in about 30 more VMs and they are running excellent.

Best,