Foreman 1.18 slow API calls

Problem:

Foreman API call after upgrade now takes over 8 seconds which previously was under 1 second, I’m not sure if it’s isolated to the fact_values endpoint, yet.

Expected outcome:

Previous releases (1.16.*) returned the results in under 1 second

Foreman and Proxy versions:

Debian Stretch with:
ii foreman 1.18.1-1 amd64 Systems management web interface
ii foreman-cli 1.18.1-1 all metapackage providing hammer CLI for Foreman

ii foreman-debug 1.18.1-1 all provides support utility foreman-debug.
ii foreman-libvirt 1.18.1-1 all metapackage providing libvirt dependencies for Foreman
ii foreman-postgresql 1.18.1-1 all metapackage providing PostgreSQL dependencies for Foreman
ii foreman-proxy 1.18.1-1 all RESTful proxies for DNS, DHCP, TFTP, and Puppet
ii ruby-foreman-deface 1.2.0-1 all Foreman Deface Plugin Dependency
ii ruby-foreman-dhcp-browser 0.0.8-1 all Foreman DHCP browser Plugin
ii ruby-foreman-discovery 12.0.2-1 all Foreman Discovery Plugin
ii ruby-foreman-hooks 0.3.14-1 all Foreman Hooks Plugin
ii ruby-foreman-templates 6.0.3-1 all Foreman Templates Plugin
ii ruby-hammer-cli-foreman 0.13.0-1 all Foreman commands for Hammer

Foreman and Proxy plugin versions:

Other relevant data:

Example curl command:

time curl -k -u admin:admin -H “Accept: version=2, application/json” -H “Content-Type: application/json” -X GET “https://theforeman.example.net/api/fact_values?per_page=5&search=name=ipmi_macaddress
{
“total”: 432656,
“subtotal”: 5,
“page”: 1,
“per_page”: 5,
“search”: “name=ipmi_macaddress”,
“sort”: {
“by”: null,
“order”: null
},
“results”: {“kr1000.example.net”:{“ipmi_macaddress”:“00:25:90:0b:24:dc”},“kr110e.example.net”:{“ipmi_macaddress”:“00:25:90:3f:4c:2c”},“kr1207.example.net”:{“ipmi_macaddress”:“00:25:90:4d:15:49”},“rb1404.example.net”:{“ipmi_macaddress”:“00:25:90:52:48:67”},“kr1204.example.net”:{“ipmi_macaddress”:“00:25:90:6d:f8:cf”}}
}

real 0m10.244s
user 0m0.028s
sys 0m0.000s

I had some issues getting SQL debug logging working - but when I did, I got this in the logs ( I snipped the list of IDs in the last query as it was many, many lines!):

2018-08-22T10:02:45 [I|app|ddca7] Processing by Api::V2::FactValuesController#index as JSON
2018-08-22T10:02:45 [I|app|ddca7] Parameters: {“per_page”=>“5”, “search”=>“name=ipmi_macaddress”, “apiv”=>“v2”, “fact_value”=>{}}
2018-08-22T10:02:45 [D|sql|ddca7] Setting Load (0.6ms) SELECT “settings”.* FROM “settings” WHERE “settings”.“name” = $1 ORDER BY “settings”.“name” ASC LIMIT $2 [[“name”, “authorize_login_delegation_api”], [“LIMIT”, 1]]
2018-08-22T10:02:45 [D|sql|ddca7] User Load (0.3ms) SELECT “users”.* FROM “users” WHERE “users”.“lower_login” = $1 LIMIT $2 [[“lower_login”, “admin”], [“LIMIT”, 1]]
2018-08-22T10:02:45 [D|sql|ddca7] PersonalAccessToken Load (0.5ms) SELECT “personal_access_tokens”.* FROM “personal_access_tokens” WHERE “personal_access_tokens”.“revoked” = $1 AND (expires_at >= ‘2018-08-22 10:02:45.931186’ OR expires_at IS NULL) AND “personal_access_tokens”.“user_id” = 3 AND “personal_access_tokens”.“token” = $2 LIMIT $3 [[“revoked”, “f”], [“token”, “7ea0bb43659d7fe52511a090115b3048b0fa14fe”], [“LIMIT”, 1]]
2018-08-22T10:02:45 [D|sql|ddca7] AuthSource Load (0.5ms) SELECT “auth_sources”.* FROM “auth_sources” WHERE “auth_sources”.“id” = $1 LIMIT $2 [[“id”, 1], [“LIMIT”, 1]]
2018-08-22T10:02:45 [D|sql|ddca7] (0.2ms) SELECT “users”.“id” FROM “users”
2018-08-22T10:02:45 [D|sql|ddca7] CACHE (0.0ms) SELECT “users”.“id” FROM “users”
2018-08-22T10:02:45 [D|sql|ddca7] CACHE User Load (0.0ms) SELECT “users”.* FROM “users” WHERE “users”.“lower_login” = $1 LIMIT $2 [[“lower_login”, “admin”], [“LIMIT”, 1]]
2018-08-22T10:02:45 [D|sql|ddca7] Authenticated user admin against INTERNAL authentication source
2018-08-22T10:02:45 [D|sql|ddca7] User Load (0.3ms) SELECT “users”.* FROM “users” WHERE “users”.“lower_login” = $1 LIMIT $2 [[“lower_login”, “foreman_admin”], [“LIMIT”, 1]]
2018-08-22T10:02:45 [I|app|ddca7] Current user: foreman_admin (administrator)
2018-08-22T10:02:45 [D|sql|ddca7] (0.2ms) SELECT auth_sources.id FROM “auth_sources” WHERE “auth_sources”.“type” IN (‘AuthSourceHidden’)
2018-08-22T10:02:45 [D|sql|ddca7] User Load (0.3ms) SELECT “users”.* FROM “users” WHERE (“users”.“auth_source_id” NOT IN (2)) AND “users”.“lower_login” = $1 LIMIT $2 [[“lower_login”, “admin”], [“LIMIT”, 1]]
2018-08-22T10:02:45 [I|app|ddca7] Authorized user admin(Admin User)
2018-08-22T10:02:45 [I|app|ddca7] Current user: admin (administrator)
2018-08-22T10:02:45 [D|sql|ddca7] (0.1ms) BEGIN
2018-08-22T10:02:45 [D|sql|ddca7] SQL (0.2ms) DELETE FROM “sessions” WHERE “sessions”.“id” = $1 [[“id”, 21634422]]
2018-08-22T10:02:45 [D|sql|ddca7] (0.3ms) COMMIT
2018-08-22T10:02:45 [D|sql|ddca7] Setting Load (0.5ms) SELECT “settings”.* FROM “settings” WHERE “settings”.“name” = $1 ORDER BY “settings”.“name” ASC LIMIT $2 [[“name”, “idle_timeout”], [“LIMIT”, 1]]
2018-08-22T10:02:46 [D|sql|ddca7] SQL (3.6ms) SELECT “fact_values”.“id” AS t0_r0, “fact_values”.“value” AS t0_r1, “fact_values”.“fact_name_id” AS t0_r2, “fact_values”.“host_id” AS t0_r3, “fact_values”.“updated_at” AS t0_r4, “fact_values”.“created_at” AS t0_r5, “fact_names”.“id” AS t1_r0, “fact_names”.“name” AS t1_r1, “fact_names”.“updated_at” AS t1_r2, “fact_names”.“created_at” AS t1_r3, “fact_names”.“compose” AS t1_r4, “fact_names”.“short_name” AS t1_r5, “fact_names”.“type” AS t1_r6, “fact_names”.“ancestry” AS t1_r7 FROM “fact_values” INNER JOIN “fact_names” ON “fact_names”.“id” = “fact_values”.“fact_name_id” WHERE (fact_names.name <> ‘_timestamp’) AND ((“fact_names”.“name” = ‘ipmi_macaddress’)) ORDER BY “fact_values”.“value” ASC NULLS FIRST LIMIT $1 OFFSET $2 [[“LIMIT”, 5], [“OFFSET”, 0]]
2018-08-22T10:02:46 [D|sql|ddca7] Host::Base Load (0.5ms) SELECT “hosts”.* FROM “hosts” WHERE “hosts”.“id” IN (1218, 517, 612, 687, 586)
2018-08-22T10:02:46 [D|sql|ddca7] Host::Managed Load (0.4ms) SELECT “hosts”.* FROM “hosts” WHERE “hosts”.“type” IN (‘Host::Managed’) AND “hosts”.“id” IN (1218, 517, 612, 687, 586)
2018-08-22T10:02:46 [I|app|ddca7] Rendering api/v2/fact_values/index.json.rabl within api/v2/layouts/index_layout
2018-08-22T10:02:46 [I|app|ddca7] Rendered api/v2/fact_values/index.json.rabl within api/v2/layouts/index_layout (0.8ms)
2018-08-22T10:02:46 [D|sql|ddca7] (135.5ms) SELECT “fact_values”.“id” FROM “fact_values”
2018-08-22T10:02:46 [D|sql|ddca7] CACHE (0.0ms) SELECT “fact_values”.“id” FROM “fact_values”
2018-08-22T10:02:54 [D|sql|ddca7] (2122.9ms) SELECT COUNT(*) FROM “fact_values” WHERE “fact_values”.“id” IN (199210, 199211, 199212, 199213, 199214, 199215, 199216, 199217, 199218, 199219, 199220, 199221, 199222, 199223, 199224, 199225, 199226, 199227, 199228, 199229,.,.,.)
2018-08-22T10:02:54 [I|app|ddca7] Completed 200 OK in 8837ms (Views: 6473.6ms | ActiveRecord: 2280.8ms)

One line that seems strange for me is this one:

2018-08-22T10:02:54 [D|sql|ddca7] (2122.9ms) SELECT COUNT(*) FROM “fact_values” WHERE “fact_values”.“id” IN (199210, 199211, 199212, 199213, 199214, 199215, 199216, 199217, 199218, 199219, 199220, 199221, 199222, 199223, 199224, 199225, 199226, 199227, 199228, 199229,.,.,.)

it takes 2 seconds to run the count query, which should be executed very quickly using the index.
Which database are you using? can you run the following query in your database console?

EXPLAIN SELECT COUNT(*) FROM “fact_values” WHERE “fact_values”.“id” IN (199210, 199211, 199212, 199213, 199214, 199215, 199216, 199217, 199218, 199219, 199220, 199221, 199222, 199223, 199224, 199225, 199226, 199227, 199228, 199229);

Thanks, we’re running postgres 9.6:

ii postgresql-9.6 9.6.10-1.pgdg90+1 amd64 object-relational SQL database, version 9.6 server

foreman=# EXPLAIN SELECT COUNT(*) FROM "fact_values" WHERE "fact_values"."id" IN (199210, 199211, 199212, 199213, 199214, 199215, 199216, 199217, 199218, 199219, 199220, 199221, 199222, 199223, 199224, 199225, 199226, 199227, 199228, 199229);
                                                                                         QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=145.15..145.16 rows=1 width=8)
   ->  Index Only Scan using fact_values_pkey on fact_values  (cost=0.42..145.10 rows=20 width=0)
         Index Cond: (id = ANY ('{199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,199225,199226,199227,199228,199229}'::bigint[]))
(3 rows)

Though tbh, the last line is more concerning - the query being 2 seconds slow is not good) but the Views section taking 6.4 seconds is worse :slight_smile:

Hi @tbrisker, sorry to nudge you on this - I noticed you have the pen/pencil icon next to your name on the ticket - which I assume means you are in the process of writing a response - perhaps in a browser tab which you’ve forgotten about (happens to me all the time) - I would love to hear your thoughts if you do have something for me to try?

My bet is that the SQL query you analyzed is not full, I believe there is like thousand of IDs in that query, thus there is much text to send to postgres and this can be the bottleneck, it’s not PostgreSQL itself.

The second issue is view rendering time which is actually 6 seconds. That’s too much, there are obviously some loops which are generate nothing as the response seem to be just dozen of lines. Or this is some kind of eager load.

Yeah, I snipped the original SQL as it was massive - I can run the analyze again if required with the full load of IDs? I ran the (full) query in isolation at the time and it took 1.8s so pretty close to the 2s seen in the application log.

Do you have any thoughts about how I could help further debug this - the 2s I can live with, the 6s after that I can’t?

Some deeper investigation let me to realize this is caused due to the fix for Bug #21871: [Hammer] PG::Error while non-admin user listing entities in default location - Foreman. Essentially, to calculate the total number of elements, now the api loads all of the facts to memory just to count how many there are (in the scope permitted for the user). Please open a bug report on projects.theforman.org and link to that issue.

1 Like

Ah, good spot - I even looked at that ticket but thought it not relevant as it was fixing something! I’ll open a ticket, is there some work flow for trying to help fix the bug?

PRs are always welcome :wink: - if you’re interested in that, take a look at Foreman :: Contribute. If not, monitor the issue, and if you can test out proposed patches in your environment prior to them being merged that would be very helpful (considering this is most noticeable with large DBs).

https://projects.theforeman.org/issues/24712

2 Likes

Thanks for reporting the issue and finding the culprit! After years of hassle free foreman upgrades I seem to have no luck with 1.18 :slight_smile:

A PR to fix this has been opened:

https://github.com/theforeman/foreman/pull/5993

1 Like

Can you guys test the patch? Be aware that this was not yet fully tested or reviewed, but if you struggle in production it might be good way out for you. Also we need the feedback from real world setups on performance patches. @tbrisker++ for doing this!

Many thanks to all involved with this @tbrisker / @lzap - this certainly fixes the issue we were seeing with slow response times from the fact_values api endpoint

time curl -k -u admin:admin -H “Accept: version=2, application/json” -H “Content-Type: application/json” -X GET “https://theforeman.example.net/api/fact_values?per_page=5&search=name=ipmi_macaddress
{
“total”: 407959,
“subtotal”: 5,
“page”: 1,
“per_page”: 5,
“search”: “name=ipmi_macaddress”,
“sort”: {
“by”: null,
“order”: null
},
“results”: {“kr1000.example.net”:{“ipmi_macaddress”:“00:25:90:0b:24:dc”},“kr110e.example.net”:{“ipmi_macaddress”:“00:25:90:3f:4c:2c”},“kr1207.example.net”:{“ipmi_macaddress”:“00:25:90:4d:15:49”},“rb1404.example.net”:{“ipmi_macaddress”:“00:25:90:52:48:67”},“kr1204.example.net”:{“ipmi_macaddress”:“00:25:90:6d:f8:cf”}}
}

real 0m0.428s
user 0m0.024s
sys 0m0.008s

2 Likes

Looks like the PR is still failing on some tests, i’ll continue working on this and let you know again once i have a version that passes all tests.

Yeah no problem, I can test again whenever you have some changes.

1 Like

Looks like the tests are now passing, so it would be great if you can test the latest version as well.

unless I’m mistaken, the code changes (non testing) in the PR are unchanged? So I can test again, though nothing I’m running will have actually changed?

there was one small change from the previous version, https://github.com/theforeman/foreman/pull/5993/files#diff-b4f92b4e361ffab4fbb91c68eb3f77bbR96 changed from select to pluck.