Remove execution is not reaching out to a proxy

Problem:

I have recently upgraded to 2.3.1 (from 1.24), and enabled remote execution. My initial remote execution tests worked perfectly with a single remote ssh proxy, however after configuring multiple ssh proxies on different subnets, the job requests stopped reaching out to any smart proxy and all jobs hang on pending.

Expected outcome:

Service should reach out to smart proxy and start the dynflow tasks.

Foreman and Proxy versions:

2.3.1 foreman and foreman-proxy

Foreman and Proxy plugin versions:
tfm-rubygem-smart_proxy_dynflow_core-0.3.2-1.fm2_3.el7.noarch
t fm-rubygem-dynflow-1.4.7-1.fm2_3.el7.noarch
foreman-dynflow-sidekiq-2.3.1-1.el7.noarch
tfm-rubygem-smart_proxy_dynflow-0.3.0-2.fm2_3.el7.noarch

tfm-rubygem-foreman_remote_execution-4.2.1-1.fm2_3.el7.noarch
tfm-rubygem-smart_proxy_remote_execution_ssh-0.3.1-1.fm2_3.el7.noarch
tfm-rubygem-foreman_remote_execution_core-1.4.0-1.el7.noarch
tfm-rubygem-hammer_cli_foreman_remote_execution-0.2.1-1.fm2_3.el7.noarch

Distribution and version:

Centos 7.9.2009

Other relevant data:

Server Settings

remote_execution_cleanup_working_dirs                  | true
remote_execution_cockpit_url                           |
remote_execution_connect_by_ip                         | true
remote_execution_ssh_key_passphrase                    | *****
remote_execution_ssh_password                          | *****
remote_execution_effective_user                        | root
remote_execution_effective_user_method                 | sudo
remote_execution_effective_user_password               | *****
remote_execution_global_proxy                          | true
remote_execution_fallback_proxy                        | false
remote_execution_form_job_template                     | Run Command - SSH Default
remote_execution_ssh_port                              | 22
remote_execution_ssh_user                              | root
remote_execution_sync_templates                        | true
remote_execution_workers_pool_size                     | 5
foreman_tasks_proxy_batch_trigger                      | true
foreman_tasks_polling_multiplier                       | 1
foreman_tasks_proxy_action_retry_count                 | 4
foreman_tasks_proxy_action_retry_interval              | 15
foreman_tasks_proxy_batch_size                         | 100
foreman_tasks_sync_task_timeout                        | 120
foreman_tasks_troubleshooting_url                      |
dynflow_enable_console                                 | true
dynflow_console_require_auth                           | true

foreman-proxy settings.yml

---
:settings_directory: /etc/foreman-proxy/settings.d
:ssl_ca_file: /var/lib/puppet/ssl/certs/ca.pem
:ssl_certificate: /var/lib/puppet/ssl/certs/foreman.x.x.x.pem
:ssl_private_key: /var/lib/puppet/ssl/private_keys/foreman.x.x.x.pem
:trusted_hosts:
  - foreman.x.x.x
:foreman_url: https://foreman.x.x.x
:daemon: true
:bind_host: '*'
:https_port: 8443
:log_file: /var/log/foreman-proxy/proxy.log
:log_level: INFO
:log_buffer: 2000
:log_buffer_errors: 1000

foreman-proxy remote_execution_ssh.yml

---
:enabled: https
:ssh_identity_key_file: /var/lib/foreman-proxy/ssh/id_rsa_foreman_proxy
:local_working_dir: /var/tmp
:remote_working_dir: /var/tmp
:kerberos_auth: false

# Whether to run remote execution jobs asynchronously
:async_ssh: false

foreman-proxy dynflow.yml

---
:enabled: https
:database: 
:core_url: https://foreman.x.x.x:8008

# If true, external core will be used even if the core gem is available
# If false, the feature will be disabled if the core gem is unavailable
# If unset, the process will fallback to autodetection, using external core if the core gem is unavailable
:external_core: true

As some added steps, I ran a tcpdump on the foreman server for port 8443 and kicked off some jobs a packet on that port is never sent. Not sure what else would be relevant here except production.log which i need to sanitize first before I post.

production.log (22.6 KB)

I think I figured this out. I must of had a stopped or pending job that was blocking everything. I ended up truncating the foreman_tasks*, dynflow_*, job_invocations tables and things started working again.

This is quite brutal, but I’m glad to hear you got it moving again. Just a pity that we won’t learn what was the exact cause of this

Im pretty sure I know the issue, I was testing a remote proxy and that proxy did not have a database file for dynflow, and I restarted those services multiple times while testing. Those jobs that were stuck I couldnt abort/cancel them most likely because the jobs didnt exist in the proxies dynflow any longer. I did see foreman reach out to that proxy when trying to delete the job but was getting a 404.