Foreman DB locks

Hi,

We are seeing some DB locking issues during normal usage. I was hoping to
get some pointers on what to look at to try and solve this/decrease the
number of locks.

We have 8 live Foreman 1.12 servers (each with 6 passenger instances) with
35k+ active nodes. I'm using Foreman as a Puppet 3.8 ENC, with a mix of 30
or 60 minute puppet run intervals and all the logs being stored in Foreman.
We're using postgresql94 for the DB on a 4 core/16GB RAM VM (single DB
instance). All of this is running on RHEL6.6.

What am I seeing? 200-300 locks at peak times, usually when a report is
being run. Outside of that, we seem to hover at 100-ish locks.
What have I tried? We have found that adding more active Foreman servers
(or passenger instances) increases the number of locks. We have found that
reducing the number of Foreman servers (or passenger instances) results in
Puppet agent timeouts.
What is causing the locks?

Running this command
(from: https://wiki.postgresql.org/wiki/Lock_Monitoring ):
myforemandb=> SELECT blocked_locks.pid AS blocked_pid,
myforemandb-> blocked_activity.usename AS blocked_user,
myforemandb-> blocking_locks.pid AS blocking_pid,
myforemandb-> blocking_activity.usename AS blocking_user,
myforemandb-> blocked_activity.query AS blocked_statement,
myforemandb-> blocking_activity.query AS
current_statement_in_blocking_process,
myforemandb-> blocked_activity.application_name AS
blocked_application,
myforemandb-> blocking_activity.application_name AS
blocking_application
myforemandb-> FROM pg_catalog.pg_locks blocked_locks
myforemandb-> JOIN pg_catalog.pg_stat_activity blocked_activity ON
blocked_activity.pid = blocked_locks.pid
myforemandb-> JOIN pg_catalog.pg_locks blocking_locks
myforemandb-> ON blocking_locks.locktype = blocked_locks.locktype
myforemandb-> AND blocking_locks.DATABASE IS NOT DISTINCT FROM
blocked_locks.DATABASE
myforemandb-> AND blocking_locks.relation IS NOT DISTINCT FROM
blocked_locks.relation
myforemandb-> AND blocking_locks.page IS NOT DISTINCT FROM
blocked_locks.page
myforemandb-> AND blocking_locks.tuple IS NOT DISTINCT FROM
blocked_locks.tuple
myforemandb-> AND blocking_locks.virtualxid IS NOT DISTINCT FROM
blocked_locks.virtualxid
myforemandb-> AND blocking_locks.transactionid IS NOT DISTINCT FROM
blocked_locks.transactionid
myforemandb-> AND blocking_locks.classid IS NOT DISTINCT FROM
blocked_locks.classid
myforemandb-> AND blocking_locks.objid IS NOT DISTINCT FROM
blocked_locks.objid
myforemandb-> AND blocking_locks.objsubid IS NOT DISTINCT FROM
blocked_locks.objsubid
myforemandb-> AND blocking_locks.pid != blocked_locks.pid
myforemandb->
myforemandb-> JOIN pg_catalog.pg_stat_activity blocking_activity ON
blocking_activity.pid = blocking_locks.pid
myforemandb-> WHERE NOT blocked_locks.GRANTED;

Gives me a wall of text that looks like:

blocked_pid | blocked_user | blocking_pid | blocking_user |
blocked_statement | current_statement_in_blocking_process |
blocked_application | blocking_application
7500 | myforemanuser | 7546 | myforemanuser |
DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM
"reports" WHERE "reports"."type" IN ('ConfigReport') AND
(reports.created_at < '2017-07-17 14:30:28')) | DELETE FROM "logs" WHERE
"logs"."report_id" IN (SELECT "reports"."id" FROM "reports" WHERE
"reports"."type" IN ('ConfigReport') AND (reports.created_at < '2017-07-17
14:32:13')) | |
7503 | myforemanuser | 7543 | myforemanuser |
DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM
"reports" WHERE "reports"."type" IN ('ConfigReport') AND
(reports.created_at < '2017-07-17 14:31:57')) | DELETE FROM "logs" WHERE
"logs"."report_id" IN (SELECT "reports"."id" FROM "reports" WHERE
"reports"."type" IN ('ConfigReport') AND (reports.created_at < '2017-07-17
14:31:06')) | |
7503 | myforemanuser | 7532 | myforemanuser |
DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM
"reports" WHERE "reports"."type" IN ('ConfigReport') AND
(reports.created_at < '2017-07-17 14:31:57')) | DELETE FROM "logs" WHERE
"logs"."report_id" IN (SELECT "reports"."id" FROM "reports" WHERE
"reports"."type" IN ('ConfigReport') AND (reports.created_at < '2017-07-17
14:30:50')) | |
7503 | myforemanuser | 7530 | myforemanuser |
DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM
"reports" WHERE "reports"."type" IN ('ConfigReport') AND
(reports.created_at < '2017-07-17 14:31:57')) | DELETE FROM "logs" WHERE
"logs"."report_id" IN (SELECT "reports"."id" FROM "reports" WHERE
"reports"."type" IN ('ConfigReport') AND (reports.created_at < '2017-07-17
14:31:43')) | |
etc.

tl;dr: How can I reduce the number of DB locks?

I am shooting in the dark, but we have this cleanup cron job:

Expire old reports

30 7 * * * foreman /usr/sbin/foreman-rake reports:expire
>>/var/log/foreman/cron.log 2>&1

It runs daily, so modify it to run weekly perhaps or schedule it into
some quiet period if you have any as an initial workaround.

Now, note the above command runs in a separate process, check logs and
try to monitor that and measure how fast it cleans up old reports.
While the reports is expiring, it blocks all hosts which are checking
in. If this takes 10 minutes, then you will encounter lots of requests
enquen

If you confirm my suspicion, the solution would be perhaps to break up
record deletion into smaller batches (e.g. 1k records at one time),
perhaps with an extra 100ms delay so requests are not blocked (but the
expiration process will last longer). It would be good idea to test
performance and perhaps set it to expire on a weekly basis rather than
daily.

Can you tell how many config reports (and associated log lines,
messages and sources) do you expire every day? This should be in
cron.log file.

··· On Mon, Jul 24, 2017 at 5:47 PM, Chris Baldwin wrote: > Hi, > > We are seeing some DB locking issues during normal usage. I was hoping to > get some pointers on what to look at to try and solve this/decrease the > number of locks. > > We have 8 live Foreman 1.12 servers (each with 6 passenger instances) with > 35k+ active nodes. I'm using Foreman as a Puppet 3.8 ENC, with a mix of 30 > or 60 minute puppet run intervals and all the logs being stored in Foreman. > We're using postgresql94 for the DB on a 4 core/16GB RAM VM (single DB > instance). All of this is running on RHEL6.6. > > What am I seeing? 200-300 locks at peak times, usually when a report is > being run. Outside of that, we seem to hover at 100-ish locks. > What have I tried? We have found that adding more active Foreman servers (or > passenger instances) increases the number of locks. We have found that > reducing the number of Foreman servers (or passenger instances) results in > Puppet agent timeouts. > What is causing the locks? > > Running this command (from: https://wiki.postgresql.org/wiki/Lock_Monitoring > ): > myforemandb=> SELECT blocked_locks.pid AS blocked_pid, > myforemandb-> blocked_activity.usename AS blocked_user, > myforemandb-> blocking_locks.pid AS blocking_pid, > myforemandb-> blocking_activity.usename AS blocking_user, > myforemandb-> blocked_activity.query AS blocked_statement, > myforemandb-> blocking_activity.query AS > current_statement_in_blocking_process, > myforemandb-> blocked_activity.application_name AS > blocked_application, > myforemandb-> blocking_activity.application_name AS > blocking_application > myforemandb-> FROM pg_catalog.pg_locks blocked_locks > myforemandb-> JOIN pg_catalog.pg_stat_activity blocked_activity ON > blocked_activity.pid = blocked_locks.pid > myforemandb-> JOIN pg_catalog.pg_locks blocking_locks > myforemandb-> ON blocking_locks.locktype = blocked_locks.locktype > myforemandb-> AND blocking_locks.DATABASE IS NOT DISTINCT FROM > blocked_locks.DATABASE > myforemandb-> AND blocking_locks.relation IS NOT DISTINCT FROM > blocked_locks.relation > myforemandb-> AND blocking_locks.page IS NOT DISTINCT FROM > blocked_locks.page > myforemandb-> AND blocking_locks.tuple IS NOT DISTINCT FROM > blocked_locks.tuple > myforemandb-> AND blocking_locks.virtualxid IS NOT DISTINCT FROM > blocked_locks.virtualxid > myforemandb-> AND blocking_locks.transactionid IS NOT DISTINCT FROM > blocked_locks.transactionid > myforemandb-> AND blocking_locks.classid IS NOT DISTINCT FROM > blocked_locks.classid > myforemandb-> AND blocking_locks.objid IS NOT DISTINCT FROM > blocked_locks.objid > myforemandb-> AND blocking_locks.objsubid IS NOT DISTINCT FROM > blocked_locks.objsubid > myforemandb-> AND blocking_locks.pid != blocked_locks.pid > myforemandb-> > myforemandb-> JOIN pg_catalog.pg_stat_activity blocking_activity ON > blocking_activity.pid = blocking_locks.pid > myforemandb-> WHERE NOT blocked_locks.GRANTED; > > Gives me a wall of text that looks like: > > blocked_pid | blocked_user | blocking_pid | blocking_user | > blocked_statement | current_statement_in_blocking_process | > blocked_application | blocking_application > 7500 | myforemanuser | 7546 | myforemanuser | > DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM > "reports" WHERE "reports"."type" IN ('ConfigReport') AND (reports.created_at > < '2017-07-17 14:30:28')) | DELETE FROM "logs" WHERE "logs"."report_id" IN > (SELECT "reports"."id" FROM "reports" WHERE "reports"."type" IN > ('ConfigReport') AND (reports.created_at < '2017-07-17 14:32:13')) | > > > 7503 | myforemanuser | 7543 | myforemanuser | > DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM > "reports" WHERE "reports"."type" IN ('ConfigReport') AND (reports.created_at > < '2017-07-17 14:31:57')) | DELETE FROM "logs" WHERE "logs"."report_id" IN > (SELECT "reports"."id" FROM "reports" WHERE "reports"."type" IN > ('ConfigReport') AND (reports.created_at < '2017-07-17 14:31:06')) | > > > 7503 | myforemanuser | 7532 | myforemanuser | > DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM > "reports" WHERE "reports"."type" IN ('ConfigReport') AND (reports.created_at > < '2017-07-17 14:31:57')) | DELETE FROM "logs" WHERE "logs"."report_id" IN > (SELECT "reports"."id" FROM "reports" WHERE "reports"."type" IN > ('ConfigReport') AND (reports.created_at < '2017-07-17 14:30:50')) | > > > 7503 | myforemanuser | 7530 | myforemanuser | > DELETE FROM "logs" WHERE "logs"."report_id" IN (SELECT "reports"."id" FROM > "reports" WHERE "reports"."type" IN ('ConfigReport') AND (reports.created_at > < '2017-07-17 14:31:57')) | DELETE FROM "logs" WHERE "logs"."report_id" IN > (SELECT "reports"."id" FROM "reports" WHERE "reports"."type" IN > ('ConfigReport') AND (reports.created_at < '2017-07-17 14:31:43')) | > > > etc. > > tl;dr: How can I reduce the number of DB locks? > > -- > You received this message because you are subscribed to the Google Groups > "Foreman users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to foreman-users+unsubscribe@googlegroups.com. > To post to this group, send email to foreman-users@googlegroups.com. > Visit this group at https://groups.google.com/group/foreman-users. > For more options, visit https://groups.google.com/d/optout.


Later,
Lukas @lzap Zapletal