Cleaning the foreman database and controlling large tables

Kyle_Flavin · October 30, 2015, 7:54pm

We run a decent sized Foreman environment (2 foreman servers, 1 MySQL
database, 3 puppet masters, ~8000 hosts). Lately, we've been having
problems removing certain host records. Sometimes, the delete times out in
the Web UI, and I get an error that looks like this:

Mysql2::Error: Lock wait timeout exceeded; try restarting transaction:
UPDATE models SET hosts_count = COALESCE(hosts_count, 0) - 1 WHERE
models.id = 46 ORDER BY models.name

This made me think it was a deadlock issue:
https://groups.google.com/forum/#!topic/foreman-users/KHZHi8ktD0s

Sometimes, the delete just seems to run indefinitely. I kicked off one
(tried to delete one "stuck" host record) this morning and it ran for an
hour and a half. I followed it in "show engine innodb status" for an hour
and a half before I killed the process.

I'm starting to think our problems are due to our database growing out of
control in terms of space consumed, and number of records in our tables, as
opposed to deadlock:

cd /usr/mysql/data/foreman

du -s * | sort -r -k1n

…
1321876 sources.MYD
1441796 param_values.ibd
1919892 sources.MYI
38199352 sessions.ibd
45965380 reports.ibd
88887416 logs.ibd

The 'logs' table is taking up 85G of space, reports 45G, and sessions 38G.
Running a "select count(*) from logs" took over 6 minutes to complete, and
returned 550 million records.

My question(s)… Is it safe to just delete older records from these
tables? Is there a standard way to clean or archive these tables that
doesn't involve me going into the production database and running "DELETE
from" queries?

Thanks,
Kyle

ohadlevy · October 30, 2015, 8:43pm

> We run a decent sized Foreman environment (2 foreman servers, 1 MySQL
> database, 3 puppet masters, ~8000 hosts). Lately, we've been having
> problems removing certain host records. Sometimes, the delete times out in
> the Web UI, and I get an error that looks like this:
>
> Mysql2::Error: Lock wait timeout exceeded; try restarting transaction:
> UPDATE models SET hosts_count = COALESCE(hosts_count, 0) - 1 WHERE
> models.id = 46 ORDER BY models.name
>
> This made me think it was a deadlock issue:
> https://groups.google.com/forum/#!topic/foreman-users/KHZHi8ktD0s
>

I think the deadlocks are probably related to the counter cache, and not so
much to the db size, of course it manifest itself only on a large db.

>
>
> Sometimes, the delete just seems to run indefinitely. I kicked off one
> (tried to delete one "stuck" host record) this morning and it ran for an
> hour and a half. I followed it in "show engine innodb status" for an hour
> and a half before I killed the process.
>
> I'm starting to think our problems are due to our database growing out of
> control in terms of space consumed, and number of records in our tables, as
> opposed to deadlock:
>
> # cd /usr/mysql/data/foreman
> # du -s * | sort -r -k1n
> …
> 1321876 sources.MYD
> 1441796 param_values.ibd
> 1919892 sources.MYI
> 38199352 sessions.ibd
> 45965380 reports.ibd
> 88887416 logs.ibd
>
> The 'logs' table is taking up 85G of space, reports 45G, and sessions
> 38G. Running a "select count(*) from logs" took over 6 minutes to
> complete, and returned 550 million records.
>
> My question(s)… Is it safe to just delete older records from these
> tables? Is there a standard way to clean or archive these tables that
> doesn't involve me going into the production database and running "DELETE
> from" queries?
>

Why don't you run the report expiry ? see Expire reports automatically
section under [1]

Ohad

[1] Foreman :: Manual

>
>
Thanks,

···

On Fri, Oct 30, 2015 at 9:54 PM, Kyle Flavin wrote: > Kyle > > -- > You received this message because you are subscribed to the Google Groups > "Foreman users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to foreman-users+unsubscribe@googlegroups.com. > To post to this group, send email to foreman-users@googlegroups.com. > Visit this group at http://groups.google.com/group/foreman-users. > For more options, visit https://groups.google.com/d/optout. >

Kyle_Flavin · October 30, 2015, 9:18pm

Thanks Ohad. I just came across the report expiry. I'll give that a shot
and report back.

···

On Friday, October 30, 2015 at 1:43:50 PM UTC-7, ohad wrote: > > > > On Fri, Oct 30, 2015 at 9:54 PM, Kyle Flavin > wrote: > >> We run a decent sized Foreman environment (2 foreman servers, 1 MySQL >> database, 3 puppet masters, ~8000 hosts). Lately, we've been having >> problems removing certain host records. Sometimes, the delete times out in >> the Web UI, and I get an error that looks like this: >> >> Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: >> UPDATE `models` SET `hosts_count` = COALESCE(`hosts_count`, 0) - 1 WHERE >> `models`.`id` = 46 ORDER BY models.name >> >> This made me think it was a deadlock issue: >> https://groups.google.com/forum/#!topic/foreman-users/KHZHi8ktD0s >> > > I think the deadlocks are probably related to the counter cache, and not > so much to the db size, of course it manifest itself only on a large db. > >> >> >> Sometimes, the delete just seems to run indefinitely. I kicked off one >> (tried to delete one "stuck" host record) this morning and it ran for an >> hour and a half. I followed it in "show engine innodb status" for an hour >> and a half before I killed the process. >> >> I'm starting to think our problems are due to our database growing out of >> control in terms of space consumed, and number of records in our tables, as >> opposed to deadlock: >> >> # cd /usr/mysql/data/foreman >> # du -s * | sort -r -k1n >> ... >> 1321876 sources.MYD >> 1441796 param_values.ibd >> 1919892 sources.MYI >> 38199352 sessions.ibd >> 45965380 reports.ibd >> 88887416 logs.ibd >> >> The 'logs' table is taking up 85G of space, reports 45G, and sessions >> 38G. Running a "select count(*) from logs" took over 6 minutes to >> complete, and returned 550 million records. >> >> My question(s)... Is it safe to just delete older records from these >> tables? Is there a standard way to clean or archive these tables that >> doesn't involve me going into the production database and running "DELETE >> from" queries? >> > > Why don't you run the report expiry ? see Expire reports automatically > section under [1] > > Ohad > > [1] http://theforeman.org/manuals/1.9/index.html#3.5.4PuppetReports > >> >> > Thanks, >> Kyle >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Foreman users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-user...@googlegroups.com . >> To post to this group, send email to forema...@googlegroups.com >> . >> Visit this group at http://groups.google.com/group/foreman-users. >> For more options, visit https://groups.google.com/d/optout. >> > >

Kyle_Flavin · November 3, 2015, 4:51pm

Hi Ohad,
I was able to schedule a maintenance window for this yesterday. We started
by truncating the sessions, reports, and logs tables, due to their large
size. Afterwards, I ran the rake job:

time /usr/sbin/foreman-rake reports:expire

It ran close to 8 hours without completing. Towards the end, the web UI
stopped loading with an error message that said "This website is under
heavy load".

One of our DBA's found that "sources" table was locked by one delete
statement. Our sources table looks to be about 1GB, with an additional 1.9
GB of index data:

du sources.* | sort -r -k1n

12 sources.frm
1324332 sources.MYD
1923352 sources.MYI

Is this table possibly too large? Are there other tables that the above
rake reports expiry job touches that may cause problems as well?

After manually truncating sessions, reports, and logs, sources is now our
largest table. I noticed that it uses MyISAM, while the other tables are
InnoDB.

···

On Friday, October 30, 2015 at 2:18:34 PM UTC-7, Kyle Flavin wrote: > > Thanks Ohad. I just came across the report expiry. I'll give that a shot > and report back. > > > On Friday, October 30, 2015 at 1:43:50 PM UTC-7, ohad wrote: >> >> >> >> On Fri, Oct 30, 2015 at 9:54 PM, Kyle Flavin wrote: >> >>> We run a decent sized Foreman environment (2 foreman servers, 1 MySQL >>> database, 3 puppet masters, ~8000 hosts). Lately, we've been having >>> problems removing certain host records. Sometimes, the delete times out in >>> the Web UI, and I get an error that looks like this: >>> >>> Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: >>> UPDATE `models` SET `hosts_count` = COALESCE(`hosts_count`, 0) - 1 >>> WHERE `models`.`id` = 46 ORDER BY models.name >>> >>> This made me think it was a deadlock issue: >>> https://groups.google.com/forum/#!topic/foreman-users/KHZHi8ktD0s >>> >> >> I think the deadlocks are probably related to the counter cache, and not >> so much to the db size, of course it manifest itself only on a large db. >> >>> >>> >>> Sometimes, the delete just seems to run indefinitely. I kicked off one >>> (tried to delete one "stuck" host record) this morning and it ran for an >>> hour and a half. I followed it in "show engine innodb status" for an hour >>> and a half before I killed the process. >>> >>> I'm starting to think our problems are due to our database growing out >>> of control in terms of space consumed, and number of records in our tables, >>> as opposed to deadlock: >>> >>> # cd /usr/mysql/data/foreman >>> # du -s * | sort -r -k1n >>> ... >>> 1321876 sources.MYD >>> 1441796 param_values.ibd >>> 1919892 sources.MYI >>> 38199352 sessions.ibd >>> 45965380 reports.ibd >>> 88887416 logs.ibd >>> >>> The 'logs' table is taking up 85G of space, reports 45G, and sessions >>> 38G. Running a "select count(*) from logs" took over 6 minutes to >>> complete, and returned 550 million records. >>> >>> My question(s)... Is it safe to just delete older records from these >>> tables? Is there a standard way to clean or archive these tables that >>> doesn't involve me going into the production database and running "DELETE >>> from" queries? >>> >> >> Why don't you run the report expiry ? see Expire reports automatically >> section under [1] >> >> Ohad >> >> [1] http://theforeman.org/manuals/1.9/index.html#3.5.4PuppetReports >> >>> >>> >> Thanks, >>> Kyle >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Foreman users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to foreman-user...@googlegroups.com. >>> To post to this group, send email to forema...@googlegroups.com. >>> Visit this group at http://groups.google.com/group/foreman-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >>

Kyle_Flavin · November 5, 2015, 5:05pm

For anyone interested, I was unable to get the foreman-rake job to run.
The database appears to be growing too fast when reports are sent to
Foreman. I tried manually truncating the following tables:

reports
messages
sources
logs
sessions

Unfortunately, "logs" was growing so fast, within two days it had 8 million
records in it, and the rake job wouldn't run against it. Twice it got
stuck running on the following query:

DELETE FROM sources WHERE (id not IN (SELECT source_id FROM logs ))

I'm not sure if there's a way to optimize this query. For the time being
we've had to stop sending reports to Foreman from the puppetmasters. It's
not ideal, but we're stuck.

···

On Tuesday, November 3, 2015 at 8:51:10 AM UTC-8, Kyle Flavin wrote: > > Hi Ohad, > I was able to schedule a maintenance window for this yesterday. We > started by truncating the sessions, reports, and logs tables, due to their > large size. Afterwards, I ran the rake job: > > time /usr/sbin/foreman-rake reports:expire > > It ran close to 8 hours without completing. Towards the end, the web UI > stopped loading with an error message that said "This website is under > heavy load". > > One of our DBA's found that "sources" table was locked by one delete > statement. Our sources table looks to be about 1GB, with an additional 1.9 > GB of index data: > > # du sources.* | sort -r -k1n > 12 sources.frm > 1324332 sources.MYD > 1923352 sources.MYI > > Is this table possibly too large? Are there other tables that the above > rake reports expiry job touches that may cause problems as well? > > After manually truncating sessions, reports, and logs, sources is now our > largest table. I noticed that it uses MyISAM, while the other tables are > InnoDB. > > > > > On Friday, October 30, 2015 at 2:18:34 PM UTC-7, Kyle Flavin wrote: >> >> Thanks Ohad. I just came across the report expiry. I'll give that a >> shot and report back. >> >> >> On Friday, October 30, 2015 at 1:43:50 PM UTC-7, ohad wrote: >>> >>> >>> >>> On Fri, Oct 30, 2015 at 9:54 PM, Kyle Flavin wrote: >>> >>>> We run a decent sized Foreman environment (2 foreman servers, 1 MySQL >>>> database, 3 puppet masters, ~8000 hosts). Lately, we've been having >>>> problems removing certain host records. Sometimes, the delete times out in >>>> the Web UI, and I get an error that looks like this: >>>> >>>> Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: >>>> UPDATE `models` SET `hosts_count` = COALESCE(`hosts_count`, 0) - 1 >>>> WHERE `models`.`id` = 46 ORDER BY models.name >>>> >>>> This made me think it was a deadlock issue: >>>> https://groups.google.com/forum/#!topic/foreman-users/KHZHi8ktD0s >>>> >>> >>> I think the deadlocks are probably related to the counter cache, and not >>> so much to the db size, of course it manifest itself only on a large db. >>> >>>> >>>> >>>> Sometimes, the delete just seems to run indefinitely. I kicked off one >>>> (tried to delete one "stuck" host record) this morning and it ran for an >>>> hour and a half. I followed it in "show engine innodb status" for an hour >>>> and a half before I killed the process. >>>> >>>> I'm starting to think our problems are due to our database growing out >>>> of control in terms of space consumed, and number of records in our tables, >>>> as opposed to deadlock: >>>> >>>> # cd /usr/mysql/data/foreman >>>> # du -s * | sort -r -k1n >>>> ... >>>> 1321876 sources.MYD >>>> 1441796 param_values.ibd >>>> 1919892 sources.MYI >>>> 38199352 sessions.ibd >>>> 45965380 reports.ibd >>>> 88887416 logs.ibd >>>> >>>> The 'logs' table is taking up 85G of space, reports 45G, and sessions >>>> 38G. Running a "select count(*) from logs" took over 6 minutes to >>>> complete, and returned 550 million records. >>>> >>>> My question(s)... Is it safe to just delete older records from these >>>> tables? Is there a standard way to clean or archive these tables that >>>> doesn't involve me going into the production database and running "DELETE >>>> from" queries? >>>> >>> >>> Why don't you run the report expiry ? see Expire reports automatically >>> section under [1] >>> >>> Ohad >>> >>> [1] http://theforeman.org/manuals/1.9/index.html#3.5.4PuppetReports >>> >>>> >>>> >>> Thanks, >>>> Kyle >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Foreman users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to foreman-user...@googlegroups.com. >>>> To post to this group, send email to forema...@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/foreman-users. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>>

Dominic_Cleal2 · November 6, 2015, 8:14am

1.7.3 had a fix for a performance regression on this report expiry
introduced in 1.7.0: Bug #8565: report::expire is running very slowly - Foreman

If you're using Foreman 1.7.1 as per the other thread, updating to at
least the latest minor release 1.7.5 is recommended.

···

On 05/11/15 17:05, Kyle Flavin wrote: > For anyone interested, I was unable to get the foreman-rake job to run. > The database appears to be growing too fast when reports are sent to > Foreman. I tried manually truncating the following tables: > > reports > messages > sources > logs > sessions > > Unfortunately, "logs" was growing so fast, within two days it had 8 > million records in it, and the rake job wouldn't run against it. Twice > it got stuck running on the following query: > > > > DELETE FROM `sources`WHERE (id notIN (SELECT source_id FROM `logs`)) > > > > I'm not sure if there's a way to optimize this query.

–
Dominic Cleal
dominic@cleal.org

Kyle_Flavin · November 6, 2015, 6:53pm

Great, thanks Dominic, I missed that. Yes, we're running 1.7.1. I'll give
it a try and report back.

···

On Friday, November 6, 2015 at 12:14:22 AM UTC-8, Dominic Cleal wrote: > > On 05/11/15 17:05, Kyle Flavin wrote: > > For anyone interested, I was unable to get the foreman-rake job to run. > > The database appears to be growing too fast when reports are sent to > > Foreman. I tried manually truncating the following tables: > > > > reports > > messages > > sources > > logs > > sessions > > > > Unfortunately, "logs" was growing so fast, within two days it had 8 > > million records in it, and the rake job wouldn't run against it. Twice > > it got stuck running on the following query: > > > > > > > DELETE FROM `sources`WHERE (id notIN (SELECT source_id FROM `logs`)) > > > > > > > I'm not sure if there's a way to optimize this query. > > 1.7.3 had a fix for a performance regression on this report expiry > introduced in 1.7.0: http://projects.theforeman.org/issues/8565 > > If you're using Foreman 1.7.1 as per the other thread, updating to at > least the latest minor release 1.7.5 is recommended. > > -- > Dominic Cleal > dom...@cleal.org >

Kyle_Flavin · November 11, 2015, 5:19pm

The issue turned out to be we were missing an index on source_id in the
logs table. Not exactly sure how that came about, but I compared our
production database to a fresh installation, and it was missing. Once it
was added back in, it made a huge difference.

···

On Friday, November 6, 2015 at 10:53:45 AM UTC-8, Kyle Flavin wrote: > > Great, thanks Dominic, I missed that. Yes, we're running 1.7.1. I'll > give it a try and report back. > > On Friday, November 6, 2015 at 12:14:22 AM UTC-8, Dominic Cleal wrote: >> >> On 05/11/15 17:05, Kyle Flavin wrote: >> > For anyone interested, I was unable to get the foreman-rake job to run. >> > The database appears to be growing too fast when reports are sent to >> > Foreman. I tried manually truncating the following tables: >> > >> > reports >> > messages >> > sources >> > logs >> > sessions >> > >> > Unfortunately, "logs" was growing so fast, within two days it had 8 >> > million records in it, and the rake job wouldn't run against it. Twice >> > it got stuck running on the following query: >> > > >> > >> > DELETE FROM `sources`WHERE (id notIN (SELECT source_id FROM `logs`)) >> > > >> > >> > I'm not sure if there's a way to optimize this query. >> >> 1.7.3 had a fix for a performance regression on this report expiry >> introduced in 1.7.0: http://projects.theforeman.org/issues/8565 >> >> If you're using Foreman 1.7.1 as per the other thread, updating to at >> least the latest minor release 1.7.5 is recommended. >> >> -- >> Dominic Cleal >> dom...@cleal.org >> >