Different response format on identical requests when executed in parralel

Problem:
Whenever I run multiple POST requests to create discovery rules in parralel, I ocassionally get a 201 response with a different payload format.
I’m unable to reproduce this if I execute the requests sequentially. More parralel requests I execute, bigger change of hitting this issue:

{"discovery_rule":{"id":742,"name":"57531206","search":"foo","hostgroup_id":125,"hostname":"foo","max_count":0,"priority":29227,"enabled":true,"created_at":"2020-09-16T10:33:11.220Z","updated_at":"2020-09-16T10:33:11.220Z"}}
{"id":745,"name":"304796992","search":"foo","hostgroup_id":125,"hostname":"foo","max_count":0,"priority":17120,"enabled":true,"created_at":"2020-09-16T10:33:12.128Z","updated_at":"2020-09-16T10:33:12.128Z"}

Expected outcome:
Consistent payload format

Foreman and Proxy versions:
foreman-2.1.2.11-2.el7sat.noarch
foreman-proxy-2.1.2-2.el7sat.noarch

Foreman and Proxy plugin versions:
# rpm -qa foreman-*
foreman-postgresql-2.1.2.11-2.el7sat.noarch
foreman-selinux-2.1.2.1-1.el7sat.noarch
foreman-ec2-2.1.2.11-2.el7sat.noarch
foreman-service-2.1.2.11-2.el7sat.noarch
foreman-debug-2.1.2.11-2.el7sat.noarch
foreman-vmware-2.1.2.11-2.el7sat.noarch
foreman-dynflow-sidekiq-2.1.2.11-2.el7sat.noarch
foreman-bootloaders-redhat-tftpboot-202005201200-1.el7sat.noarch
foreman-proxy-2.1.2-2.el7sat.noarch
foreman-discovery-image-3.6.5-1.el7sat.noarch
foreman-installer-2.1.2.6-1.el7sat.noarch
foreman-installer-katello-2.1.2.6-1.el7sat.noarch
foreman-openstack-2.1.2.11-2.el7sat.noarch
foreman-bootloaders-redhat-202005201200-1.el7sat.noarch
foreman-gce-2.1.2.11-2.el7sat.noarch
foreman-libvirt-2.1.2.11-2.el7sat.noarch
foreman-ovirt-2.1.2.11-2.el7sat.noarch
foreman-cli-2.1.2.11-2.el7sat.noarch

`# rpm -qa katello-*
katello-default-ca-1.0-1.noarch
katello-client-bootstrap-1.7.5-1.el7sat.noarch
katello-debug-3.16.0-1.el7sat.noarch
katello-server-ca-1.0-1.noarch
katello-certs-tools-2.7.1-1.el7sat.noarch
katello-selinux-3.4.0-1.el7sat.noarch
katello-ca-consumer-dhcp-8-29-228.lab.eng.rdu2.redhat.com-1.0-5.noarch
katello-common-3.16.0-1.el7sat.noarch`

Distribution and version:
Red Hat Enterprise Linux Server release 7.8 (Maipo)

Other relevant data:

# grep 6b772cdc /var/log/foreman/production.log
2020-09-16T05:09:45 [I|app|6b772cdc] Started POST "/api/v2/discovery_rules" for 127.0.0.1 at 2020-09-16 05:09:45 -0400
2020-09-16T05:09:45 [I|app|6b772cdc] Processing by Api::V2::DiscoveryRulesController#create as JSON
2020-09-16T05:09:45 [I|app|6b772cdc]   Parameters: {"discovery_rule"=>{"name"=>"<address>mclMyFZCkHBOTmESILmFzbGKdHXRPTMepDJ</address>", "hostname"=>"myhost-<%= rand(99999) %>", "priority"=>546, "hostgroup_id"=>125, "location_ids"=>[259], "organization_ids"=>[255], "search"=>"memory > 500"}, "apiv"=>"v2"}
2020-09-16T05:09:45 [D|app|6b772cdc] Authenticated user admin against INTERNAL authentication source
2020-09-16T05:09:45 [D|app|6b772cdc] Post-login processing for admin
2020-09-16T05:09:45 [I|app|6b772cdc] Authorized user admin(Admin User)
2020-09-16T05:09:45 [D|app|6b772cdc] Post-login processing for admin
2020-09-16T05:09:45 [D|tax|6b772cdc] Current location set to none
2020-09-16T05:09:45 [D|tax|6b772cdc] Current organization set to none
2020-09-16T05:09:45 [D|tax|6b772cdc] Current location set to none
2020-09-16T05:09:45 [D|tax|6b772cdc] Current organization set to none
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on name <address>mclMyFZCkHBOTmESILmFzbGKdHXRPTMepDJ</address>
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on search memory > 500
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on hostgroup_id 125
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on hostname myhost-<%= rand(99999) %>
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on max_count 0
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on priority 546
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on enabled true
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on organization_ids [255]
2020-09-16T05:09:45 [I|aud|6b772cdc] DiscoveryRule (442) create event on location_ids [259]
2020-09-16T05:09:45 [D|app|6b772cdc] Body: {"discovery_rule":{"id":442,"name":"\u003caddress\u003emclMyFZCkHBOTmESILmFzbGKdHXRPTMepDJ\u003c/address\u003e","search":"memory \u003e 500","hostgroup_id":125,"hostname":"myhost-\u003c%= rand(99999) %\u003e","max_count":0,"priority":546,"enabled":true,"created_at":"2020-09-16T09:09:45.291Z","updated_at":"2020-09-16T09:09:45.291Z"}}
2020-09-16T05:09:45 [I|app|6b772cdc] Completed 201 Created in 244ms (Views: 0.5ms | ActiveRecord: 95.8ms | Allocations: 50630)
 
===========
 
# grep 2605d72c /var/log/foreman/production.log
2020-09-16T05:10:42 [I|app|2605d72c] Started POST "/api/v2/discovery_rules" for 127.0.0.1 at 2020-09-16 05:10:42 -0400
2020-09-16T05:10:42 [I|app|2605d72c] Processing by Api::V2::DiscoveryRulesController#create as JSON
2020-09-16T05:10:42 [I|app|2605d72c]   Parameters: {"discovery_rule"=>{"name"=>"<frameset>qyPzhDnC</frameset>", "hostname"=>"myhost-<%= rand(99999) %>", "priority"=>463, "hostgroup_id"=>122, "location_ids"=>[256], "organization_ids"=>[252], "search"=>"memory > 500"}, "apiv"=>"v2"}
2020-09-16T05:10:42 [D|app|2605d72c] Authenticated user admin against INTERNAL authentication source
2020-09-16T05:10:42 [D|app|2605d72c] Post-login processing for admin
2020-09-16T05:10:42 [I|app|2605d72c] Authorized user admin(Admin User)
2020-09-16T05:10:42 [D|app|2605d72c] Post-login processing for admin
2020-09-16T05:10:42 [D|tax|2605d72c] Current location set to none
2020-09-16T05:10:42 [D|tax|2605d72c] Current organization set to none
2020-09-16T05:10:42 [D|tax|2605d72c] Current location set to none
2020-09-16T05:10:42 [D|tax|2605d72c] Current organization set to none
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on name <frameset>qyPzhDnC</frameset>
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on search memory > 500
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on hostgroup_id 122
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on hostname myhost-<%= rand(99999) %>
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on max_count 0
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on priority 463
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on enabled true
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on organization_ids [252]
2020-09-16T05:10:42 [I|aud|2605d72c] DiscoveryRule (462) create event on location_ids [256]
2020-09-16T05:10:42 [D|app|2605d72c] Body: {"id":462,"name":"\u003cframeset\u003eqyPzhDnC\u003c/frameset\u003e","search":"memory \u003e 500","hostgroup_id":122,"hostname":"myhost-\u003c%= rand(99999) %\u003e","max_count":0,"priority":463,"enabled":true,"created_at":"2020-09-16T09:10:42.714Z","updated_at":"2020-09-16T09:10:42.714Z"}
2020-09-16T05:10:42 [I|app|2605d72c] Completed 201 Created in 146ms (Views: 0.4ms | ActiveRecord: 41.1ms | Allocations: 28787)

here’s my reproducer (mind that you might need to create and update other entity ids accordingly to your setup:

for i in {1..10}; do curl -sku admin:changeme -H "content-type: application/json" -d "{\"name\": \"$RANDOM$RANDOM\", \"hostname\": \"myhost-<%= rand(99999) %>\", \"priority\": $RANDOM, \"hostgroup_id\": 125, \"location_ids\": [259], \"organization_ids\": [255], \"search\": \"memory > 500\"}" https://<sat_server>/api/v2/discovery_rules; echo ""; done

I identified some other API resources with the same problem. Looks like this might be a general API issue.
I’m looking further into this and am updating a list of affected resources here:

Also, I filed a BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1879587
(will change the component and the title accordingly as soon as I’ll get more data)

Roman just confirmed this happens on puma based deployment only, with passenger it works. An immediate (but very poor) workaround is to configure puma to use single thread. To do so, use the following installer argument --foreman-foreman-service-puma-threads-max 1. The other option is to switch back to passenger, for that, use --foreman-passenger true

That’s quite odd, the response content should be generated by the foreman application, regardless of which webserver it is running under.
However, I did find one thing in common for all three endpoints identified: They don’t have matching view templates.
I wounder if copying e.g. <$discovery_plugin>/app/views/api/v2/discovery_rules/show.json.rabl to <$discovery_plugin>/app/views/api/v2/discovery_rules/create.json.rabl would lead to any difference?

This article does show that it’s very easy in controllers to make threading mistakes, as yet another example that thread safe code is hard.
https://bearmetal.eu/theden/how-do-i-know-whether-my-rails-app-is-thread-safe-or-not/

Memoization is used a lot in our codebase. Probably not always problematic, but there’s a good chance we’re doing something wrong since we’ve always ran with Passenger and didn’t use threads. That means it’s likely we’re only just uncovering how thread-unsafe Foreman really is.

spot on,
this seems to did the trick and i no longer spot the inconsistent behavior.

How memorization can be a problem in a controller? Each request has its own controller instance, therefore memorization within a controller cannot be a problem. I believe the linked article is wrong about this.

Anyway, some time ago we made a shoking switch in our codebase by enabling threadsafe! in the codebase. This was done without a good understanding of consequences, I was asking for this and IIRC the response was something “it works in dev just fine”. And here we are.

I suggest to configure Puma with only single thread and properly plan some review and testing.

After reading the memorization section I can say it’s very misleading, I should say even completely wrong in the controller context where every request has its own instance. The only valid point for Rails controllers is sharing data via class variables which is a bad in any multi-threaded application (it’s a global state). Also point about context switch in ||= is not applicable in a controller, however that makes sense in a shared instance.

Anyway, catching these kind of bugs can be a challenge. It does not need to be actually something in our code, we carry ton of libraries and I believe they are the biggest problem when it comes to thread safety - it’s almost impossible to tell which are safe.

In this particular case, I’d probably start digging in the RABL codebase because there is some caching going on. A cache = global state. Therefore this code must be thead safe. I think it’s more likely these are bugs in libraries than in our Ruby on Rails code.

Okay that was fast:

I am not saying this hasn’t been fixed but after 10 minutes of searching I could not find much improvements, there was a one patch fixing one particular problem which was not Rails related.

What is the expected behavior? I don’t understand Rails too much, is this some convention over configuration? What exactly is this supposed to pick? Shouldn’t we see some debug messages with more info about what template Rails are picking to render the response?

Ok I tried to create a pure Rails app, when create.html.erb is missing, an error is thrown:

Started GET "/welcome/create" for 127.0.0.1 at 2020-09-18 15:21:15 +0200
Processing by WelcomeController#create as */*
No template found for WelcomeController#create, rendering head :no_content
Completed 204 No Content in 1ms (ActiveRecord: 0.0ms | Allocations: 211)

It looks like there is some fallback mechanism in RABL when template is not found it picks index.json.rabl.

Anyway, I spotted a global variable (hash) in RABL and created a mutex for it. Unfortunately it does not fix the problem. I asked RABL devs for more insights: https://github.com/nesquena/rabl/pull/738