Hi Tomer,
The value is set to 400:
show variables like ‘innodb_lock_wait_timeout’;
±-------------------------±------+
Variable_name | Value |
±-------------------------±------+
innodb_lock_wait_timeout | 400 |
±-------------------------±------+
1 row in set (0.00 sec)
Over the past day I’ve backported the two changes from:
https://github.com/theforeman/foreman/pull/2362
Fixes #12241 - Correct counter_cache deadlock fix by tbrisker · Pull Request #2839 · theforeman/foreman · GitHub
into our 1.7.1 Foreman servers (we’re running two), and I’ve restarted
the MySQL database, just to start fresh. However, I’m still unable to
remove a number of Foreman records. When I go to delete the record through
the Web UI, I still see this error:
Mysql2::Error: Lock wait timeout exceeded; try restarting transaction:
UPDATE models
SET hosts_count
= COALESCE(hosts_count
, 0) - 1
WHERE models
.id
= 46 ORDER BY models.name
When I do a “show processlist” on the database, I can see the query
timing out:
60 | foreman | myhost.example.com:48892 | foreman | Query |
390 | Searching rows for update
UPDATE models
SET hosts_count
= COALESCE(hosts_count
, 0) - 1
WHERE models
.id
= 46 ORDER BY
Where *390 *is the timeout. Once it gets to 400, the process clears.
If I try the delete again, a new process appears and also times out after
400 seconds. I could drop the timeout to 100, but looking through the list
of processes, it doesn’t appear that there are any other long running
processes (other than a process doing a binlog dump to our standby
database, but I don’t think that’s an issue) timing out, which might also
be holding a lock. When I try to delete the bad record, I see the one long
lived process start up and die.
The odd part is, there are other records I can remove. The records I
can’t remove appear to be the same every time. It’s as if a particular
group of records has a problem. What would cause one set of records to be
unable to update the host_count after repeated tries, while others can
immediately?
We run 3 puppet masters with ~8000 hosts, and two Foreman servers that
talk to a single MySQL database. None of the machines look particularly
overloaded, but I’m wondering if the database isn’t happy with two app
servers trying to talk to it simultaneously. Are there any additional
settings you could recommend I set to give me more visibility?
On Wednesday, October 28, 2015 at 2:16:30 AM UTC-7, Tomer Brisker wrote:
Hello Kyle,
You can check the value of your lock time out using:
show variables like ‘innodb_lock_wait_timeout’;
If it is set to a very high value, you can change it using
set innodb_lock_wait_timeout=100
for example to set it to time out after 100 seconds.
You can also force kill the stuck process(es) so you can get the hosts
unstuck:
show processlist;
kill <process_id>;
After getting your hosts unstuck, you can try to check if the two
aforementioned patches fix the deadlock indeed or not.
Or it might be completely unrelated and just a one-time problem that
locked them in the first place.
On Mon, Oct 26, 2015 at 8:12 PM, Kyle Flavin kyle....@gmail.com wrote:
Hi Tomer,
Unfortunately it doesn’t look like we have logging turned on for this
particular server. Error logs are turned on, but nothing appears there
since September.
I was able to login and do a “show engine innodb status\G” and got the
following (sanitized output) - it seems to show two transactions attempting
to update host_count on the architecture table. These transactions have
been there since Friday. I expected to find something even older, as we’ve
had a few “stuck” host records that we cannot delete going back to Oct 1.
LATEST DETECTED DEADLOCK
2015-10-23 00:06:10 7fc5696a0700
*** (1) TRANSACTION:
TRANSACTION 9523120346, ACTIVE 3 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1184, 2 row lock(s), undo log
entries 2
MySQL thread id 88248, OS thread handle 0x7fc5692d1700, query id
3037051607 foreman.example.com 1.1.1.1 foreman updating
UPDATE architectures
SET hosts_count
= COALESCE(hosts_count
, 0) -
1 WHERE architectures
.id
= 1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 163 page no 3 n bits 72 index PRIMARY
of table
foreman
.architectures
trx id 9523120346 lock_mode X locks rec but
not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 8; compact format;
info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 0002379f2b5f; asc 7 +_;;
2: len 7; hex 340009803725da; asc 4 7% ;;
3: len 6; hex 7838365f3634; asc x86_64;;
4: len 5; hex 9986ee3442; asc 4B;;
5: len 5; hex 9986ee3442; asc 4B;;
6: len 4; hex 80001c80; asc ;;
7: len 4; hex 80000003; asc ;;
*** (2) TRANSACTION:
TRANSACTION 9523116898, ACTIVE 15 sec starting index read
mysql tables in use 1, locked 1
23 lock struct(s), heap size 2936, 11 row lock(s), undo log entries 3
MySQL thread id 87505, OS thread handle 0x7fc5696a0700, query id
3037052724 foreman.example.com 1.1.1.1 foreman updating
UPDATE architectures
SET hosts_count
= COALESCE(hosts_count
, 0) +
1 WHERE architectures
.id
= 1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 163 page no 3 n bits 72 index PRIMARY
of table
foreman
.architectures
trx id 9523116898 lock mode S locks rec but
not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 8; compact format;
info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 0002379f2b5f; asc 7 +_;;
2: len 7; hex 340009803725da; asc 4 7% ;;
3: len 6; hex 7838365f3634; asc x86_64;;
4: len 5; hex 9986ee3442; asc 4B;;
5: len 5; hex 9986ee3442; asc 4B;;
6: len 4; hex 80001c80; asc ;;
7: len 4; hex 80000003; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 163 page no 3 n bits 72 index PRIMARY
of table
foreman
.architectures
trx id 9523116898 lock_mode X locks rec but
not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 8; compact format;
info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 0002379f2b5f; asc 7 +_;;
2: len 7; hex 340009803725da; asc 4 7% ;;
3: len 6; hex 7838365f3634; asc x86_64;;
4: len 5; hex 9986ee3442; asc 4B;;
5: len 5; hex 9986ee3442; asc 4B;;
6: len 4; hex 80001c80; asc ;;
7: len 4; hex 80000003; asc ;;
*** WE ROLL BACK TRANSACTION (1)
On Sunday, October 25, 2015 at 12:43:08 AM UTC-7, Tomer Brisker wrote:
Hello Kyle,
It is possible that this is related indeed to the counter_cache_fix, but
in order to confirm that I would need a bit more context from your logs as
to the error causing this.
The mysql log should contain the details of which two transactions are
conflicting and causing the deadlock.
If this is indeed the case, it should be possible to cherry pick only
these two patches (once the second one gets merged) to 1.7.1 if needed.
On Sat, Oct 24, 2015 at 1:06 AM, Kyle Flavin kyle....@gmail.com wrote:
We’re having trouble removing a host record from Foreman (1.7.1). It
looks like a process is holding a lock open. When we try to delete the
record from the UI, it sits there for a long time (several minutes) and
then fails with this error:
Mysql2::Error: Lock wait timeout exceeded; try
…