Problem:
We have 7 proxies, including two windows (DHCP) and the katello server itself. Out of these 7 , 2 proxies keep failing all the time. Running foreman-maintain services status shows an error:
/ All services displayed [FAIL]
Some services are not running (smart_proxy_dynflow_core)
Scenario [Status Services] failed.
The following steps ended up in failing state:
[service-status]
Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist=“service-status”
I recently started exchanging some puppet modules with ansible roles and maybe this is related. I scheduled a job to run every 20 minutes that executes the ansible roles on 300+ hosts. Some of these hosts are not online and so the tasks keep piling up. It seems that they do not time out? Maybe this is related? Most of these hosts are handeld by the the two proxies. This morning I have a total of 480 tasks, over 200 older than 24 hours.
Expected outcome:
Proxy doesn’t crash
tasks time out quicker?
Foreman and Proxy versions:
Katello 3.18.2
Foreman 2.3.3
Proxy 2.3.3
Foreman and Proxy plugin versions:
tfm-rubygem-kafo_wizards-0.0.1-4.el7.noarch
tfm-rubygem-foreman_ansible_core-4.0.0-1.fm2_3.el7.noarch
tfm-rubygem-rack-2.2.3-1.el7.noarch
tfm-rubygem-algebrick-0.7.3-7.el7.noarch
tfm-rubygem-rake-compiler-1.0.7-3.el7.noarch
tfm-rubygem-rkerberos-0.1.5-19.el7.x86_64
tfm-rubygem-netrc-0.11.0-5.el7.noarch
tfm-rubygem-jwt-2.2.1-2.el7.noarch
tfm-rubygem-sinatra-2.0.3-4.el7.noarch
tfm-rubygem-rest-client-2.0.2-3.el7.noarch
tfm-rubygem-multi_json-1.14.1-2.el7.noarch
tfm-rubygem-foreman-tasks-core-0.3.4-1.fm2_1.el7.noarch
tfm-runtime-6.1-4.el7.x86_64
tfm-rubygem-smart_proxy_dynflow-0.3.0-2.fm2_3.el7.noarch
tfm-rubygem-rack-protection-2.0.3-4.el7.noarch
tfm-rubygem-powerbar-2.0.1-2.el7.noarch
tfm-rubygem-tilt-2.0.8-4.el7.noarch
tfm-rubygem-mime-types-3.2.2-4.el7.noarch
tfm-rubygem-statsd-instrument-2.1.4-3.el7.noarch
tfm-rubygem-redfish_client-0.5.2-1.el7.noarch
tfm-rubygem-bundler_ext-0.4.1-5.el7.noarch
tfm-rubygem-hashie-3.6.0-2.el7.noarch
tfm-rubygem-concurrent-ruby-1.1.6-2.el7.noarch
tfm-rubygem-mustermann-1.0.2-4.el7.noarch
tfm-rubygem-unf-0.1.3-8.el7.noarch
tfm-rubygem-mime-types-data-3.2018.0812-4.el7.noarch
tfm-rubygem-rsec-0.4.3-4.el7.noarch
tfm-rubygem-smart_proxy_pulp-2.1.0-3.fm2_2.el7.noarch
tfm-rubygem-sequel-5.7.1-3.el7.noarch
tfm-rubygem-apipie-params-0.0.5-4.el7.noarch
tfm-rubygem-foreman_remote_execution_core-1.4.0-1.el7.noarch
tfm-rubygem-dynflow-1.4.7-1.fm2_3.el7.noarch
tfm-rubygem-smart_proxy_remote_execution_ssh-0.3.1-1.fm2_3.el7.noarch
tfm-rubygem-ansi-1.5.0-2.el7.noarch
tfm-rubygem-rubyipmi-0.10.0-6.el7.noarch
tfm-rubygem-unf_ext-0.0.7.2-3.el7.x86_64
tfm-rubygem-ruby-libvirt-0.7.1-1.el7.x86_64
tfm-rubygem-smart_proxy_discovery-1.0.5-6.fm2_2.el7.noarch
tfm-rubygem-smart_proxy_ansible-3.0.1-6.fm2_2.el7.noarch
tfm-rubygem-kafo-6.1.2-1.el7.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.3.2-1.fm2_3.el7.noarch
tfm-rubygem-rb-inotify-0.9.7-5.el7.noarch
tfm-rubygem-little-plugger-1.1.4-2.el7.noarch
tfm-rubygem-kafo_parsers-1.1.0-3.el7.noarch
tfm-rubygem-http-cookie-1.0.2-4.el7.noarch
tfm-rubygem-concurrent-ruby-edge-0.6.0-2.fm2_1.el7.noarch
tfm-rubygem-net-ssh-4.2.0-2.el7.noarch
tfm-rubygem-sd_notify-0.1.0-1.el7.noarch
tfm-rubygem-server_sent_events-0.1.2-1.el7.noarch
tfm-rubygem-highline-1.7.8-5.el7.noarch
tfm-rubygem-gssapi-1.2.0-7.el7.noarch
tfm-rubygem-xmlrpc-0.3.0-2.el7.noarch
tfm-rubygem-clamp-1.1.2-6.el7.noarch
tfm-rubygem-domain_name-0.5.20160310-4.el7.noarch
tfm-rubygem-sqlite3-1.3.13-6.el7.x86_64
tfm-rubygem-logging-2.3.0-1.el7.noarch
tfm-rubygem-excon-0.76.0-1.el7.noarch
tfm-rubygem-ffi-1.12.2-1.el7.x86_64
Distribution and version:
CentOS Linux release 7.9.2009 (Core)
Other relevant data:
grep ERROR /var/log/messages on the proxy:
May 11 01:28:16 gedapvl05 smart_proxy_dynflow_core: E, [2021-05-11T01:28:16.894921 #876] ERROR – /client-dispatcher: Could not find an executor for Dynflow::Dispatcher::Envelope[request_id: b25aa1a1-2351-4885-83bb-424f0a3a4b63-737504, sender_id: b25aa1a1-2351-4885-83bb-424f0a3a4b63, receiver_id: Dynflow::Dispatcher::UnknownWorld, message: Dynflow::Dispatcher::Event[execution_plan_id: 5b2f240e-6be6-45aa-a0ca-5d4ab98e09b7, step_id: 2, event: #ForemanTasksCore::Runner::Update:0x00007f8536f834d8, time: ]] (Dynflow::Error)