Dynflow suddenly stops working

Hi Team,

Since few days I’ve observed production server stops working for 5-10 mins intermittently.
Last time I do have faced problem with connecting to foreman-proxy error while executing a job.

Today during one of the prod activity I have seen issues with accessing jobs, login into the server.

Version we are using is : Foreman 1.19.1 (upgrade is planned but not until August).

Below are the logs I captured from production.log and dyn_flow

Please suggest anything need to be checked in foreman. Let me know if you need any other versions info.


----------------production.log----------------

2020-04-30T13:25:41 [I|app|fc33f]   Rendering /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/show.html.erb within layouts/application
2020-04-30T13:25:41 [I|app|cc28a]   Rendering /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/show.js.erb
2020-04-30T13:25:42 [I|app|cc28a]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/show.js.erb (45.0ms)
2020-04-30T13:25:42 [I|app|cc28a] Completed 200 OK in 282ms (Views: 15.7ms | ActiveRecord: 242.6ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_card_results.html.erb (0.3ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_card_schedule.html.erb (1.1ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_card_target_hosts.html.erb (28.4ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_card_user_input.html.erb (12.8ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_host_name_td.html.erb (1.8ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_host_status_td.html.erb (1.0ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_host_actions_td.html.erb (3.0ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered common/_pagination.html.erb (1.2ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_tab_hosts.html.erb (55.0ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_tab_overview.html.erb (161.8ms)
2020-04-30T13:25:42 [I|dyn|] start terminating throttle_limiter...
2020-04-30T13:25:42 [I|dyn|] start terminating client dispatcher...
2020-04-30T13:25:42 [I|dyn|] stop listening for new events...
2020-04-30T13:25:42 [I|dyn|] start terminating clock...
2020-04-30T13:25:42 [I|dyn|] start terminating throttle_limiter...
2020-04-30T13:25:42 [I|dyn|] start terminating client dispatcher...
2020-04-30T13:25:42 [I|dyn|] stop listening for new events...
2020-04-30T13:25:42 [I|dyn|] start terminating clock...
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_user_input.html.erb (151.1ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_tab_preview_templates.html.erb (240.5ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/show.html.erb within layouts/application (499.7ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered layouts/_application_content.html.erb (0.7ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendering layouts/base.html.erb
2020-04-30T13:25:42 [I|app|728d5]   Rendering hosts/index.html.erb within layouts/application
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_organization_dropdown.html.erb (31.3ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_location_dropdown.html.erb (14.9ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_org_switcher.html.erb (47.2ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_user_dropdown.html.erb (1.8ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_topbar.html.erb (50.3ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (3.6ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (1.8ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (3.4ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (3.2ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (3.6ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (5.3ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_taxonomies.html.erb (3.7ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_taxonomies.html.erb (2.8ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_vertical_menu.html.erb (1.2ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered home/_navbar.html.erb (110.6ms)
2020-04-30T13:25:42 [I|app|fc33f]   Rendered layouts/base.html.erb (182.3ms)
2020-04-30T13:25:42 [I|app|fc33f] Completed 200 OK in 918ms (Views: 351.4ms | ActiveRecord: 544.8ms)



2020-04-30T13:23:05 [E|bac|] A sub task failed (RuntimeError)
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/action/with_sub_plans.rb:230:in `check_for_errors!'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/action/with_sub_plans.rb:136:in `try_to_finish'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/action/with_polling_sub_plans.rb:18:in `poll'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/action/with_polling_sub_plans.rb:10:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/lib/actions/remote_execution/run_hosts_job.rb:87:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/action.rb:538:in `block (3 levels) in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware.rb:18:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.13.4/app/lib/actions/middleware/watch_delegated_proxy_sub_tasks.rb:17:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware.rb:18:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware.rb:31:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware.rb:18:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware.rb:31:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/middleware.rb:18:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.13.4/app/lib/actions/middleware/rails_executor_wrap.rb:14:in `block in run'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activesupport-5.1.6/lib/active_support/execution_wrapper.rb:85:in `wrap'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.13.4/app/lib/actions/middleware/rails_executor_wrap.rb:13:in `run'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `run_task'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:337:in `block (3 levels) in create_worker'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `loop'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `block (2 levels) in create_worker'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `catch'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `block in create_worker'
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
2020-04-30T13:23:08 [I|app|be50f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/_host_status_td.html.erb (173.0ms)
2020-04-30T13:23:10 [I|app|be50f]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_remote_execution-1.5.6/app/views/job_invocations/show.js.erb (21499.6ms)
2020-04-30T13:23:10 [I|app|be50f] Completed 200 OK in 24939ms (Views: 22778.5ms | ActiveRecord: 1313.9ms)

dynflow_executor.output


E, [2020-04-30T12:39:24.051780 #2896] ERROR -- /client-dispatcher: Could not find an executor for Dynflow::Dispatcher::Envelope[request_id: 2555, sender_id: <id here>, receiver_id: Dynflow::Dispatcher::UnknownWorld, message: Dynflow::Dispatcher::Event[execution_plan_id: <plan-id>, step_id: 2, event: Actions::Middleware::WatchDelegatedProxySubTasks::CheckOnProxyActions]] (Dynflow::Error)
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/dispatcher/client_dispatcher.rb:146:in `dispatch_request'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/dispatcher/client_dispatcher.rb:117:in `block (2 levels) in publish_request'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/dispatcher/client_dispatcher.rb:204:in `track_request'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/dispatcher/client_dispatcher.rb:116:in `block in publish_request'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/dispatcher/client_dispatcher.rb:245:in `with_ping_request_caching'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/dispatcher/client_dispatcher.rb:115:in `publish_request'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/actor.rb:6:in `on_message'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/context.rb:46:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/executes_context.rb:7:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.1.0/lib/dynflow/actor.rb:26:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/awaits.rb:15:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/sets_results.rb:14:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/buffer.rb:38:in `process_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/buffer.rb:31:in `process_envelopes?'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/buffer.rb:20:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/termination.rb:55:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/removes_child.rb:10:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/behaviour/sets_results.rb:14:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/core.rb:161:in `process_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/core.rb:95:in `block in on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/core.rb:118:in `block (2 levels) in schedule_execution'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `block in synchronize'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.4/lib/concurrent/actor/core.rb:115:in `block in schedule_execution'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/serialized_execution.rb:18:in `call'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/serialized_execution.rb:96:in `work'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/serialized_execution.rb:77:in `block in call_job'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `run_task'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:337:in `block (3 levels) in create_worker'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `loop'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `block (2 levels) in create_worker'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `catch'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `block in create_worker'
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

This is happening after we started upgrading our agent nodes (Splunk instances) - Splunk version from 7.1.1 to 7.3.4 (not sure this is anywhere related, but wanted to add this info as well.

Maybe @aruzicka have some insights?

The logs state clearly that dynflow executor is trying to shut itself down, however they don’t say why. Is there a chance that someone restarted the dynflowd/foreman-tasks service around that time? Do you by any chance have memory limitting enabled in /etc/sysconfig/dynflowd?

As I was the only person logged in into foreman, none had done any restarts manually.

Below are the values I see…

RUBY_GC_MALLOC_LIMIT=4000100
RUBY_GC_MALLOC_LIMIT_MAX=16000100
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1
RUBY_GC_OLDMALLOC_LIMIT=16000100
RUBY_GC_OLDMALLOC_LIMIT_MAX=16000100

#Set the number of executors you want to run
#EXECUTORS_COUNT=1

#Set memory limit for executor process, before it’s restarted automatically
#EXECUTOR_MEMORY_LIMIT=2gb

#Set delay before first memory polling to let executor initialize (in sec)
#EXECUTOR_MEMORY_MONITOR_DELAY=7200 #default: 2 hours

#Set memory polling interval, process memory will be checked every N seconds.
#EXECUTOR_MEMORY_MONITOR_INTERVAL=60