Migrating to webhooks/shellhooks: Are we doing things wrong?

Problem:
We have a bunch if additional tasks that are attached to host provisioning and lifecycle actions. Mainly things evolve around storage, NFS Share creation/deletion/resize dependent on Foreman data and resizing partitions and PVs when disksizes of VMs are increased through Foreman.
We had all of this implemented via hook scripts through foreman_hooks and everything was working fine, but sice foreman_hooks is deprecated, we saw the need to migrate everything to the new thing: webhooks with shellhooks.
So, what we did was:

  • Rewrite our old scripts (this was mainly because the hooks had become kind of a legacy codebase noone wanted to maintain anymore)
  • Write a simple webhook template to feed our shellhook script
  • Subscribe webhooks to the host_created, host_deleted and host_updated events and calling our shellhooks endpoint

In our testing environment, everything looked fine, except we realized that host_updated created a lot of noise during Puppet runs, causing the hook to trigger multiple times per Puppet run. This is probably when we should have realized we are doing something wrong, but instead we decided to to just “filter” those triggers on the script level by exiting the script prematurely if the user who issued the update is either foreman_admin or foreman_api_admin.
Last weekend we deployed this setup to our production environment and things pretty much blew up instantly. After minutes, we had over 100 tasks in a bogus state:


We could only stop this by disabling webhooks on host_update events. Since then, the spam of these tasks has stopped, but so has our workflow for what the hooks were supposed to do. After investigating this further, I assume these should have been “Deliver webhook Host Updated” tasks, but instead they became some kind of zombies. We are unable to interact with the bogus tasks except for force-cancelling them or killing them via task_cleanup. Also there is no info on the taskpage itself, the raw tab just lists a lot of blank fields except for IDs:
grafik

Expected outcome:
An answer to the questions:

  • Are we doing things wrong on a fundamental level?
  • Is host_updated an event we are not supposed to subscribe to?
    Also, working hooks without a ton of dead zombie tasks would be nice :wink:
    Plus, a “proper” way to get rid of all the host_updated events triggered through Puppet runs would be nice (and might even solve the issue by just not overloading Foreman with new tasks several times per second)

Foreman and Proxy versions:
3.2.1

Foreman and Proxy plugin versions:

Foreman Plugins:
 - foreman-tasks 6.0.1
 - foreman_expire_hosts 7.0.4
 - foreman_puppet 3.0.7
 - foreman_remote_execution 6.0.0
 - foreman_scc_manager 1.8.20
 - foreman_snapshot_management 2.0.1
 - foreman_templates 9.1.0
 - foreman_webhooks 3.0.3
 - katello 4.4.2.2
 - puppetdb_foreman 5.0.0

Smart Proxy:

# rpm -qa | grep smart_proxy
rubygem-smart_proxy_shellhooks-0.9.2-2.fm2_5.el8.noarch
rubygem-smart_proxy_dynflow-0.7.0-1.fm3_2.el8.noarch
rubygem-smart_proxy_dhcp_infoblox-0.0.16-6.fm3_2.el8.noarch
rubygem-smart_proxy_pulp-3.2.0-2.fm3_2.el8.noarch
rubygem-smart_proxy_remote_execution_ssh-0.5.3-1.fm3_2.el8.noarch

Distribution and version:
RHEL8.7

Other relevant data:
If there are any logs you want me to provide which could help with this issue, I can try and grab some. Since I will most likely have to enable the webhooks again for any meaningful logs, I will only to so if required.

I’m not really a webhooks/shellhooks person so I can’t speak to it, but I can derail this thread a bit :slight_smile: Could you show us what dynflow console shows for those tasks? There should be a button for it somewhere (or just go to $foreman/foreman_tasks/dynflow/$external_id ?

The dynflow consoles shows about as much as the Foreman page, nothing :wink:


grafik
grafik

At least it has a “started at” timestamp attached, which Foreman doesn’t show, but that’s about it.

Fun fact I just noticed: I cannot find anything about that ID in the production.log of that day. While I can grep for $external_id of a working job and get three messages about that job (transmitting to dynflow, execution of the webhook and receiving from dynflow), there is nothing for the ID of the “broken” job.

Huh, could we get debug logs?

What exactly do you want debug logs of?
I assume from Foreman (i.e. production.log)? I can try and get some debug logs, should be easy enough to reproduce since those tasks piled up so fast the last time we enabled the hooks.

Just production.log should be enough in this case

I was able to capture some debug logs and also identified the failed tasks the external ID. Here is the log from that request:

2023-02-08T17:06:15 [I|app|8842882c] Started POST "/api/config_reports" for 10.160.236.27 at 2023-02-08 17:06:15 +0100
2023-02-08T17:06:15 [I|app|8842882c] Processing by Api::V2::ConfigReportsController#create as JSON
2023-02-08T17:06:15 [I|app|8842882c]   Parameters: {"config_report"=>"[FILTERED]", "apiv"=>"v2"}
2023-02-08T17:06:15 [D|app|8842882c] Client sent certificate with subject 'puppetproxy.example.com' and subject alt names '["puppetproxy.example.com"]'
2023-02-08T17:06:15 [D|app|8842882c] CN and SANs were extracted from a client certificate.
2023-02-08T17:06:15 [D|app|8842882c] Verifying request from ["puppetproxy.example.com"] against [<list of smartproxies>]
2023-02-08T17:06:15 [D|app|8842882c] Processing report: {"host"=>"client.example.com", "reported_at"=>"2023-02-08 16:04:00 UTC", "status"=>{"applied"=>0, "restarted"=>0, "failed"=>0, "failed_restarts"=>0, "skipped"=>0, "pending"=>0}, "metrics"=>{"resources"=>{"changed"=>0, "corrective_change"=>0, "failed"=>0, "failed_to_restart"=>0, "out_of_sync"=>0, "restarted"=>0, "scheduled"=>0, "skipped"=>0, "total"=>385}, "time"=>{"anchor"=>0.000490885, "augeas"=>1.2025026129999998, "catalog_application"=>25.269473280990496, "concat_file"=>0.00048681699999999996, "concat_fragment"=>0.002981279999999999, "config_retrieval"=>14.841640418977477, "convert_catalog"=>0.639278615009971, "cron"=>0.000549198, "exec"=>0.20404195400000003, "fact_generation"=>88.66589018201921, "file"=>7.872653324999998, "file_line"=>0.015821999, "filebucket"=>8.6276e-05, "filesystem"=>0.086869895, "group"=>0.001864068, "logical_volume"=>0.526188599, "mount"=>0.006211277, "network_config"=>0.000745433, "network_route"=>0.000483817, "package"=>13.486791775000004, "plugin_sync"=>3.2444249549880624, "selboolean"=>0.029218056, "selinux_fcontext"=>0.000562644, "service"=>0.42916905499999997, "ssh_authorized_key"=>0.002704077, "total"=>134.12264593, "transaction_evaluation"=>25.185514822951518, "user"=>0.015191512000000002, "yum_cron_config"=>0.0063059200000000004}, "changes"=>{"total"=>0}, "events"=>{"failure"=>0, "success"=>0, "total"=>0}}, "logs"=>[{"log"=>{"sources"=>{"source"=>"Puppet"}, "messages"=>{"message"=>"Applied catalog in 25.27 seconds"}, "level"=>"notice"}}]}
2023-02-08T17:06:15 [I|app|8842882c] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2023-02-08T17:06:15 [D|tax|8842882c] Current organization set to none
2023-02-08T17:06:15 [D|tax|8842882c] Current location set to none
2023-02-08T17:06:15 [D|dyn|8842882c] ExecutionPlan 9d547753-5be8-4bbe-a8fb-4ad4cdbcab4a      pending >>  planning

After that it just stops.

A working one in contrast looks like this:

2023-02-08T17:06:14 [I|app|b7d08e91] Started POST "/api/config_reports" for 10.160.236.27 at 2023-02-08 17:06:14 +0100
2023-02-08T17:06:14 [I|app|b7d08e91] Processing by Api::V2::ConfigReportsController#create as JSON
2023-02-08T17:06:14 [I|app|b7d08e91]   Parameters: {"config_report"=>"[FILTERED]", "apiv"=>"v2"}
2023-02-08T17:06:14 [D|app|b7d08e91] Client sent certificate with subject 'puppetproxy.example.com' and subject alt names '["puppetproxy.example.com"]'
2023-02-08T17:06:14 [D|app|b7d08e91] CN and SANs were extracted from a client certificate.
2023-02-08T17:06:14 [D|app|b7d08e91] Verifying request from ["puppetproxy.example.com"] against [<list of smartproxies>]
2023-02-08T17:06:14 [D|app|b7d08e91] Processing report: {"host"=>"client2.example.com", "reported_at"=>"2023-02-08 16:04:55 UTC", "status"=>{"applied"=>0, "restarted"=>0, "failed"=>0, "failed_restarts"=>0, "skipped"=>0, "pending"=>0}, "metrics"=>{"resources"=>{"changed"=>0, "corrective_change"=>0, "failed"=>0, "failed_to_restart"=>0, "out_of_sync"=>0, "restarted"=>0, "scheduled"=>0, "skipped"=>0, "total"=>470}, "time"=>{"anchor"=>0.0006209519999999999, "augeas"=>1.102024248, "catalog_application"=>20.337088896892965, "concat_file"=>0.0005039210000000001, "concat_fragment"=>0.002566903, "config_retrieval"=>9.418634302914143, "convert_catalog"=>0.6509699751622975, "cron"=>0.000336297, "db_directory_structure"=>0.000367473, "exec"=>0.4483568569999998, "fact_generation"=>43.86517295893282, "file"=>4.41000895, "file_line"=>0.009369503999999997, "filebucket"=>5.3675e-05, "filesystem"=>0.10848193199999999, "group"=>0.0027809059999999997, "logical_volume"=>0.652032909, "mount"=>0.009028445999999999, "network_config"=>0.000573699, "network_route"=>0.00043816600000000003, "package"=>11.849279049, "plugin_sync"=>2.670822272077203, "selboolean"=>0.011466918, "selinux_fcontext"=>0.000372647, "service"=>0.34027973599999994, "ssh_authorized_key"=>0.0027460539999999995, "sysctl"=>0.0031915369999999995, "total"=>77.970639304, "transaction_evaluation"=>20.270092835649848, "user"=>0.009942660999999998, "yum_cron_config"=>0.006404968999999999}, "changes"=>{"total"=>0}, "events"=>{"failure"=>0, "success"=>0, "total"=>0}}, "logs"=>[{"log"=>{"sources"=>{"source"=>"Puppet"}, "messages"=>{"message"=>"Applied catalog in 20.34 seconds"}, "level"=>"notice"}}]}
2023-02-08T17:06:14 [I|app|b7d08e91] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2023-02-08T17:06:14 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:14 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:14 [D|dyn|b7d08e91] ExecutionPlan 6af23aee-9bb5-4c2c-8a5f-493cb3a10214      pending >>  planning
2023-02-08T17:06:14 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:14 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:14 [D|dyn|b7d08e91]          Step 6af23aee-9bb5-4c2c-8a5f-493cb3a10214: 1   pending >>   running in phase     Plan Dynflow::ActiveJob::QueueAdapters::JobWrapper
2023-02-08T17:06:14 [D|dyn|b7d08e91]          Step 6af23aee-9bb5-4c2c-8a5f-493cb3a10214: 1   running >>   success in phase     Plan Dynflow::ActiveJob::QueueAdapters::JobWrapper
2023-02-08T17:06:14 [D|dyn|b7d08e91] ExecutionPlan 6af23aee-9bb5-4c2c-8a5f-493cb3a10214     planning >>   planned
2023-02-08T17:06:14 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:14 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:14 [I|app|b7d08e91] Enqueued ForemanWebhooks::DeliverWebhookJob (Job ID: 6af23aee-9bb5-4c2c-8a5f-493cb3a10214) to Dynflow(default) with arguments: {:event_name=>"host_updated.event.foreman", :payload=>"{\"id\":2865,\"fqdn\":\"client2.example.com\",\"os\":{\"id\":18,\"major\":\"7\",\"name\":\"RedHat\",\"minor\":\"9\",\"nameindicator\":null,\"created_at\":\"2020-10-07T09:03:19.287Z\",\"updated_at\":\"2020-10-13T05:24:48.900Z\",\"release_name\":\"\",\"description\":\"RHEL Server 7.9\",\"password_hash\":\"SHA256\",\"title\":\"RHEL Server 7.9\"},\"hostgroup\":{\"id\":94,\"name\":\"Konsolidierung\",\"created_at\":\"2018-01-10T15:46:03.096Z\",\"updated_at\":\"2018-01-10T15:46:03.096Z\",\"operatingsystem_id\":null,\"architecture_id\":null,\"medium_id\":null,\"ptable_id\":92,\"root_pass\":\"redacted\",\"puppet_ca_proxy_id\":null,\"use_image\":null,\"image_file\":\"\",\"ancestry\":\"1/17/76\",\"vm_defaults\":null,\"subnet_id\":23,\"domain_id\":null,\"puppet_proxy_id\":3,\"title\":\"mia-base/db/oracle/Konsolidierung\",\"realm_id\":null,\"compute_profile_id\":null,\"grub_pass\":\"\",\"lookup_value_matcher\":\"hostgroup=mia-base/db/oracle/Konsolidierung\",\"subnet6_id\":null,\"pxe_loader\":\"PXELinux BIOS\",\"description\":null,\"compute_resource_id\":null,\"content_facet_attributes\":{\"id\":73,\"hostgroup_id\":94,\"kickstart_repository_id\":null,\"content_source_id\":6,\"content_view_id\":8,\"lifecycle_environment_id\":3},\"puppet_attributes\":{\"id\":3,\"hostgroup_id\":94,\"environment_id\":14,\"puppet_proxy_id\":null,\"created_at\":null,\"updated_at\":null}},\"event_name\":\"host_updated.event.foreman\",\"webhook_id\":2,\"context\":{\"remote_ip\":\"10.160.236.27\",\"request\":\"b7d08e91-c149-4873-9636-cac6c2eb24a6\",\"session\":\"b7d08e91-c149-4873-9636-cac6c2eb24a6\",\"user_login\":\"foreman_api_admin\",\"user_admin\":true}}", :headers=>nil, :url=>"https://foreman.example.com:9090/shellhook/cli", :webhook_id=>2}
2023-02-08T17:06:14 [I|app|b7d08e91] Scanning report with: Foreman::PuppetReportScanner
2023-02-08T17:06:14 [D|app|b7d08e91] Changes after reporter specific data added: {"host_id"=>[nil, 2865], "reported_at"=>[nil, Wed, 08 Feb 2023 16:04:55 UTC +00:00], "status"=>[nil, 0], "metrics"=>[nil, "---\nresources: !ruby/hash:ActiveSupport::HashWithIndifferentAccess\n  changed: 0\n  corrective_change: 0\n  failed: 0\n  failed_to_restart: 0\n  out_of_sync: 0\n  restarted: 0\n  scheduled: 0\n  skipped: 0\n  total: 470\ntime: !ruby/hash:ActiveSupport::HashWithIndifferentAccess\n  anchor: 0.0006209519999999999\n  augeas: 1.102024248\n  catalog_application: 20.337088896892965\n  concat_file: 0.0005039210000000001\n  concat_fragment: 0.002566903\n  config_retrieval: 9.418634302914143\n  convert_catalog: 0.6509699751622975\n  cron: 0.000336297\n  db_directory_structure: 0.000367473\n  exec: 0.4483568569999998\n  fact_generation: 43.86517295893282\n  file: 4.41000895\n  file_line: 0.009369503999999997\n  filebucket: 5.3675e-05\n  filesystem: 0.10848193199999999\n  group: 0.0027809059999999997\n  logical_volume: 0.652032909\n  mount: 0.009028445999999999\n  network_config: 0.000573699\n  network_route: 0.00043816600000000003\n  package: 11.849279049\n  plugin_sync: 2.670822272077203\n  selboolean: 0.011466918\n  selinux_fcontext: 0.000372647\n  service: 0.34027973599999994\n  ssh_authorized_key: 0.0027460539999999995\n  sysctl: 0.0031915369999999995\n  total: 77.970639304\n  transaction_evaluation: 20.270092835649848\n  user: 0.009942660999999998\n  yum_cron_config: 0.006404968999999999\nchanges: !ruby/hash:ActiveSupport::HashWithIndifferentAccess\n  total: 0\nevents: !ruby/hash:ActiveSupport::HashWithIndifferentAccess\n  failure: 0\n  success: 0\n  total: 0\n"], "origin"=>[nil, "Puppet"]}
2023-02-08T17:06:14 [I|app|b7d08e91] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2023-02-08T17:06:14 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:14 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:14 [D|dyn|b7d08e91] ExecutionPlan 5d9e5419-b685-4ed3-900a-a28bb25dfb58      pending >>  planning
2023-02-08T17:06:14 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:14 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:14 [D|dyn|b7d08e91]          Step 5d9e5419-b685-4ed3-900a-a28bb25dfb58: 1   pending >>   running in phase     Plan Dynflow::ActiveJob::QueueAdapters::JobWrapper
2023-02-08T17:06:15 [D|dyn|b7d08e91]          Step 5d9e5419-b685-4ed3-900a-a28bb25dfb58: 1   running >>   success in phase     Plan Dynflow::ActiveJob::QueueAdapters::JobWrapper
2023-02-08T17:06:15 [D|dyn|b7d08e91] ExecutionPlan 5d9e5419-b685-4ed3-900a-a28bb25dfb58     planning >>   planned
2023-02-08T17:06:15 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:15 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:15 [I|app|b7d08e91] Enqueued ForemanWebhooks::DeliverWebhookJob (Job ID: 5d9e5419-b685-4ed3-900a-a28bb25dfb58) to Dynflow(default) with arguments: {:event_name=>"host_updated.event.foreman", :payload=>"{\"id\":2865,\"fqdn\":\"client2.example.com\",\"os\":{\"id\":18,\"major\":\"7\",\"name\":\"RedHat\",\"minor\":\"9\",\"nameindicator\":null,\"created_at\":\"2020-10-07T09:03:19.287Z\",\"updated_at\":\"2020-10-13T05:24:48.900Z\",\"release_name\":\"\",\"description\":\"RHEL Server 7.9\",\"password_hash\":\"SHA256\",\"title\":\"RHEL Server 7.9\"},\"hostgroup\":{\"id\":94,\"name\":\"Konsolidierung\",\"created_at\":\"2018-01-10T15:46:03.096Z\",\"updated_at\":\"2018-01-10T15:46:03.096Z\",\"operatingsystem_id\":null,\"architecture_id\":null,\"medium_id\":null,\"ptable_id\":92,\"root_pass\":\"$5$nK3JpWJe$kZzjsK5p5/m1eddqPb0I.cKGniajzAth9B81AmxgtqB\",\"puppet_ca_proxy_id\":null,\"use_image\":null,\"image_file\":\"\",\"ancestry\":\"1/17/76\",\"vm_defaults\":null,\"subnet_id\":23,\"domain_id\":null,\"puppet_proxy_id\":3,\"title\":\"mia-base/db/oracle/Konsolidierung\",\"realm_id\":null,\"compute_profile_id\":null,\"grub_pass\":\"\",\"lookup_value_matcher\":\"hostgroup=mia-base/db/oracle/Konsolidierung\",\"subnet6_id\":null,\"pxe_loader\":\"PXELinux BIOS\",\"description\":null,\"compute_resource_id\":null,\"content_facet_attributes\":{\"id\":73,\"hostgroup_id\":94,\"kickstart_repository_id\":null,\"content_source_id\":6,\"content_view_id\":8,\"lifecycle_environment_id\":3},\"puppet_attributes\":{\"id\":3,\"hostgroup_id\":94,\"environment_id\":14,\"puppet_proxy_id\":null,\"created_at\":null,\"updated_at\":null}},\"event_name\":\"host_updated.event.foreman\",\"webhook_id\":2,\"context\":{\"request\":\"b7d08e91-c149-4873-9636-cac6c2eb24a6\",\"user_login\":\"foreman_api_admin\",\"user_admin\":true}}", :headers=>nil, :url=>"https://foreman.example.com:9090/shellhook/cli", :webhook_id=>2}
2023-02-08T17:06:15 [I|app|b7d08e91] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2023-02-08T17:06:15 [I|app|b7d08e91] Performing ForemanWebhooks::DeliverWebhookJob (Job ID: 6af23aee-9bb5-4c2c-8a5f-493cb3a10214) from Dynflow(default) enqueued at 2023-02-08T16:06:14Z with arguments: {:event_name=>"host_updated.event.foreman", :payload=>#<Sequel::SQL::Blob:0x36ee8 bytes=1529 start="{\"id\":2865" end="in\":true}}">, :headers=>nil, :url=>"https://foreman.example.com:9090/shellhook/cli", :webhook_id=>2}
2023-02-08T17:06:15 [I|app|b7d08e91] Performing 'Host Updated' webhook request for event 'host_updated.event.foreman'
2023-02-08T17:06:15 [W|app|b7d08e91] Could not parse HTTP headers JSON, ignoring: no implicit conversion of nil into String
2023-02-08T17:06:15 [D|app|b7d08e91] Headers:
2023-02-08T17:06:15 [D|app|b7d08e91] Webhook POST request: https://foreman.example.com:9090/shellhook/cli
2023-02-08T17:06:15 [D|app|b7d08e91] Headers: {"accept-encoding"=>["gzip;q=1.0,deflate;q=0.6,identity;q=0.3"], "accept"=>["*/*"], "user-agent"=>["Ruby"], "content-type"=>["plain/text"], "x-request-id"=>["b7d08e91-c149-4873-9636-cac6c2eb24a6"], "x-session-id"=>["6a151b18-8de0-4d36-bf21-6a1c5ace888c"]}
2023-02-08T17:06:15 [D|app|b7d08e91] Body: #<Sequel::SQL::Blob:0x36ee8 bytes=1529 start="{\"id\":2865" end="in\":true}}">
2023-02-08T17:06:15 [I|app|b7d08e91] POST response was 200
2023-02-08T17:06:15 [I|app|b7d08e91] Performed ForemanWebhooks::DeliverWebhookJob (Job ID: 6af23aee-9bb5-4c2c-8a5f-493cb3a10214) from Dynflow(default) in 18.07ms
2023-02-08T17:06:15 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:15 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:15 [D|dyn|b7d08e91]          Step 6af23aee-9bb5-4c2c-8a5f-493cb3a10214: 2   running >>   success in phase      Run Dynflow::ActiveJob::QueueAdapters::JobWrapper
2023-02-08T17:06:15 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:15 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:15 [D|dyn|b7d08e91] ExecutionPlan 332dd2ea-e898-4d6e-a96c-293e418ff39c      pending >>  planning
2023-02-08T17:06:15 [I|app|b7d08e91] Performing ForemanWebhooks::DeliverWebhookJob (Job ID: 5d9e5419-b685-4ed3-900a-a28bb25dfb58) from Dynflow(default) enqueued at 2023-02-08T16:06:14Z with arguments: {:event_name=>"host_updated.event.foreman", :payload=>#<Sequel::SQL::Blob:0x37078 bytes=1452 start="{\"id\":2865" end="in\":true}}">, :headers=>nil, :url=>"https://foreman.example.com:9090/shellhook/cli", :webhook_id=>2}
2023-02-08T17:06:15 [I|app|b7d08e91] Performing 'Host Updated' webhook request for event 'host_updated.event.foreman'
2023-02-08T17:06:15 [W|app|b7d08e91] Could not parse HTTP headers JSON, ignoring: no implicit conversion of nil into String
2023-02-08T17:06:15 [D|app|b7d08e91] Headers:
2023-02-08T17:06:15 [D|app|b7d08e91] Webhook POST request: https://foreman.example.com:9090/shellhook/cli
2023-02-08T17:06:15 [D|app|b7d08e91] Headers: {"accept-encoding"=>["gzip;q=1.0,deflate;q=0.6,identity;q=0.3"], "accept"=>["*/*"], "user-agent"=>["Ruby"], "content-type"=>["plain/text"], "x-request-id"=>["b7d08e91-c149-4873-9636-cac6c2eb24a6"], "x-session-id"=>["9d874e63-1969-4b3f-828b-f440c6bcd428"]}
2023-02-08T17:06:15 [D|app|b7d08e91] Body: #<Sequel::SQL::Blob:0x37078 bytes=1452 start="{\"id\":2865" end="in\":true}}">
2023-02-08T17:06:15 [I|app|b7d08e91] POST response was 200
2023-02-08T17:06:15 [I|app|b7d08e91] Performed ForemanWebhooks::DeliverWebhookJob (Job ID: 5d9e5419-b685-4ed3-900a-a28bb25dfb58) from Dynflow(default) in 10.65ms
2023-02-08T17:06:15 [D|tax|b7d08e91] Current organization set to none
2023-02-08T17:06:15 [D|tax|b7d08e91] Current location set to none
2023-02-08T17:06:15 [D|dyn|b7d08e91]          Step 5d9e5419-b685-4ed3-900a-a28bb25dfb58: 2   running >>   success in phase      Run Dynflow::ActiveJob::QueueAdapters::JobWrapper

If you want me to look for anything else in particular, please tell me what I should look for or how I can drop that file directly to someone to take a look at it. The raw text of about 1 minute of running in debug mode is already 4.5 MB, and I don’t see me anonymizing everything in there so I can post it publicly.

Huh, that’s unexpected. No more logs for this request?

I’ll try reproducing it locally and then we’ll see

Nope, that’s it. I fetched the logs using foreman-rake errors:fetch_log id=xyz, and thats all that came up for that request.

I believe the event is triggered whenever the Host object gets saved. That may happen multiple times during single puppet run. State changes, nic interfaces are updates, last report information is updated etc. I’d like to better understand your usecase. It’s easy to add new event, but we need to know, what exactly such event should be. E.g. should it be “Puppet report received for the Host?” or “Configuration report received for the Host?” or “Host updated by the user through form/API”? The very generic “anytime the Host is saved” may cause problems like this. I assume old foreman_hooks had the same problem but perhaps it was never noticed as it never created tasks?

This is what we assumed what happened.

Personally, we would need the latter, “Host updated by the user through form/API”.
Our use-cases currently are:

  • We run our OracleDB servers with DirectNFS as Storage, so these NFS shares need to be created, updated, or deleted throughout on the hosts lifecycle. We need those shares to exists once the host is installed, deleted after the host is deleted, and (often several times throughout the hosts life) resized.
    • For this, we are extracting Info from Foreman and call ansible playbooks from AWX through a hook script. Those playbooks do the “heavy lifting” of handling the shares, but we need to call them when a share might potentially need to be changed.
  • For our VMWare hosts (which is >99% of all hosts), we want to be able to handle disk resizing in an automatic way. If someone increases a virtual disk, they do so through Foreman. We then need to rescan the SCSI information on that host, resize the apropriate partition and the PV. Puppet will then resize the LVs later as needed.
    • We currently implemented this via a script that connects to the host via SSH, reads all disks via lslbk, compares that to what Foreman knows how the disks should look, and then performs additional actions if needed.

So, these things complete revolve around Users (or API calls) changing hosts explicitly. We personally do not care about hosts getting facts or reports uploaded for these workflows. I would assume, this is the main thing most people utilizing web/shellhooks would expect.

Other things have at least been discussed, but those mainly revolve around host creation or deletion.

I am somewhat certain the old implementation of foreman_hooks handled this differently, but I can’t say for sure. We had our old scripts log on every call, and when analyzing errors with those, I never saw that “trigger happiness” in the logs, but could be I was just blind.

Got it, that makes more sense now. I looked a bit in the code and figure the webhooks are generally hooked to model events (ActiveRecord create, update, delete) or jobs (ActiveJob actions). In this case, we’d need a specific place in controller to emit the event with a particular host instance. Thanks for detailed description of the user cases, that’s always very heplful.

@ofedoren do you know whether that’s possible? I found https://github.com/theforeman/foreman/pull/7232#issuecomment-579764383 which said the trigger_hook could be used, but it wouldn’t appear in the list. I guess it didn’t move further since then?

1 Like

So, if I understand you correctly, there is currently no “better” way how we could implement this?
We are not really able to revert back to our old hooks, since those relied on a discontinued product to create the NFS shares (NetApp Workflow automater), so we would have to reimplement that as well.

@aruzicka @Marek_Hulan
I created Bug #36105: Webhook task creation randomly failes, leaving bogus tasks in state pending/pending - Webhooks - Foreman for the problem with broken tasks and Feature #36104: As a user, I want to be able to trigger webhooks only on explicit host updates - Webhooks - Foreman for a feature request to be able to only subscribe to explicit host changes without reports, fact uploads, etc.
I opened both against the webhooks project. If you need me to fill in any more information on the topics here or in the redmine issues, let me know.

1 Like

As a “workaround” to our current problems, we decided to migrate back to using foreman_hooks for the time being. This looks like the only way we can currently get our workflows back up and running.
Since foreman_hooks is already somewhat deprecated and not developed anymore, we know we need to move to webhooks/shellhooks someday, but for now we don’t see any way of implementing our needed workflows with those.

Thanks for sharing the feedback!