Grouping by time in ActiveRecord

In the foreman_abrt [1] it would be useful if the plugin provided
dashboard graph indicating amount of crash reports detected in some
period of time. This is essentially the same as the "Run distribution
in the last 30 minutes" chart, though I assume this information is
interesting for a bit longer time so what I have now is "Crash report
distribution in last 24 hours" with 24 buckets per 1 hour instead of 10
buckets per 3 minutes.

The problem is that naive implementation generates 24 queries of the
form

abrt_reports.where(:reported_at => t…(t+interval)).count

every time the dashboard is loaded. Unfortunately there is minimal
overlap in the datetime functions supported by sqlite [2] and other
databases [3] so I don't see a way I could GROUP BY the records by the
time bucket they belong in and then count them.

Any idea how to do this portably so that a small number of queries is
used and the result set is of constant size?

Thanks,
Martin

[1] https://github.com/theforeman/foreman_abrt
[2] http://www.sqlite.org/lang_datefunc.html
[3] http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressions/Date_and_time_functions

Hi,

I'm afraid, you'll have to implement the GROUP BY part of SQL for every
adapter. I'd create a scope :group_by_week which would add :group =>
"$custom_iplementation". You can find out current adapter using

ActiveRecord::Base.connection_config[:adapter]

Then following should work

abrt_reports.where(:reported_at => t…(t+interval)).group_by_week.count

I think we are OK with mysql, postgresql, sqlite. Ideally make the
implementation extendable so anyone else can add own implementation for other
adapters.

Hope this helps

··· -- Marek

On Wednesday 17 of September 2014 12:08:57 Martin Milata wrote:

In the foreman_abrt [1] it would be useful if the plugin provided
dashboard graph indicating amount of crash reports detected in some
period of time. This is essentially the same as the “Run distribution
in the last 30 minutes” chart, though I assume this information is
interesting for a bit longer time so what I have now is “Crash report
distribution in last 24 hours” with 24 buckets per 1 hour instead of 10
buckets per 3 minutes.

The problem is that naive implementation generates 24 queries of the
form

abrt_reports.where(:reported_at => t…(t+interval)).count

every time the dashboard is loaded. Unfortunately there is minimal
overlap in the datetime functions supported by sqlite [2] and other
databases [3] so I don’t see a way I could GROUP BY the records by the
time bucket they belong in and then count them.

Any idea how to do this portably so that a small number of queries is
used and the result set is of constant size?

Thanks,
Martin

[1] https://github.com/theforeman/foreman_abrt
[2] http://www.sqlite.org/lang_datefunc.html
[3]
http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressio
ns/Date_and_time_functions


Marek

If this is the only way of doing this (I don't know) please do it in a
generic way and make sure we have those custom queries in one place so
adding support for another DB would mean dropping new file somewhere (a
class or module) and overriding few methods or hash or whatever.

I am not against this solution but I would personally like to see all
the custom queries in one place.

I was looking on this last week and I also don't think this is doable
via ANSI SQL 1999 or 2001 or whatever the latest SQL standard is today
:slight_smile:

LZ

··· On Thu, Sep 18, 2014 at 02:23:47PM +0200, Marek Hulan wrote: > Hi, > > I'm afraid, you'll have to implement the GROUP BY part of SQL for every > adapter. I'd create a scope :group_by_week which would add :group => > "$custom_iplementation". You can find out current adapter using > > ActiveRecord::Base.connection_config[:adapter] > > Then following should work > > abrt_reports.where(:reported_at => t..(t+interval)).group_by_week.count > > I think we are OK with mysql, postgresql, sqlite. Ideally make the > implementation extendable so anyone else can add own implementation for other > adapters. > > Hope this helps > > -- > Marek > > On Wednesday 17 of September 2014 12:08:57 Martin Milata wrote: > > In the foreman_abrt [1] it would be useful if the plugin provided > > dashboard graph indicating amount of crash reports detected in some > > period of time. This is essentially the same as the "Run distribution > > in the last 30 minutes" chart, though I assume this information is > > interesting for a bit longer time so what I have now is "Crash report > > distribution in last 24 hours" with 24 buckets per 1 hour instead of 10 > > buckets per 3 minutes. > > > > The problem is that naive implementation generates 24 queries of the > > form > > > > abrt_reports.where(:reported_at => t..(t+interval)).count > > > > every time the dashboard is loaded. Unfortunately there is minimal > > overlap in the datetime functions supported by sqlite [2] and other > > databases [3] so I don't see a way I could GROUP BY the records by the > > time bucket they belong in and then count them. > > > > Any idea how to do this portably so that a small number of queries is > > used and the result set is of constant size? > > > > Thanks, > > Martin > > > > [1] https://github.com/theforeman/foreman_abrt > > [2] http://www.sqlite.org/lang_datefunc.html > > [3] > > http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressio > > ns/Date_and_time_functions > > -- > Marek > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.


Later,
Lukas #lzap Zapletal

From what I found, EXTRACT(HOUR FROM datetimecolumn) is defined in ANSI
SQL 2011 and both MySQL and Postgres support it. Only sqlite will need
special treatment.

Martin

··· On Thu, Sep 18, 2014 at 14:32:36 +0200, Lukas Zapletal wrote: > If this is the only way of doing this (I don't know) please do it in a > generic way and make sure we have those custom queries in one place so > adding support for another DB would mean dropping new file somewhere (a > class or module) and overriding few methods or hash or whatever. > > I am not against this solution but I would personally like to see all > the custom queries in one place. > > I was looking on this last week and I also don't think this is doable > via ANSI SQL 1999 or 2001 or whatever the latest SQL standard is today > :-) > > LZ > > On Thu, Sep 18, 2014 at 02:23:47PM +0200, Marek Hulan wrote: > > Hi, > > > > I'm afraid, you'll have to implement the GROUP BY part of SQL for every > > adapter. I'd create a scope :group_by_week which would add :group => > > "$custom_iplementation". You can find out current adapter using > > > > ActiveRecord::Base.connection_config[:adapter] > > > > Then following should work > > > > abrt_reports.where(:reported_at => t..(t+interval)).group_by_week.count > > > > I think we are OK with mysql, postgresql, sqlite. Ideally make the > > implementation extendable so anyone else can add own implementation for other > > adapters. > > > > Hope this helps > > > > -- > > Marek > > > > On Wednesday 17 of September 2014 12:08:57 Martin Milata wrote: > > > In the foreman_abrt [1] it would be useful if the plugin provided > > > dashboard graph indicating amount of crash reports detected in some > > > period of time. This is essentially the same as the "Run distribution > > > in the last 30 minutes" chart, though I assume this information is > > > interesting for a bit longer time so what I have now is "Crash report > > > distribution in last 24 hours" with 24 buckets per 1 hour instead of 10 > > > buckets per 3 minutes. > > > > > > The problem is that naive implementation generates 24 queries of the > > > form > > > > > > abrt_reports.where(:reported_at => t..(t+interval)).count > > > > > > every time the dashboard is loaded. Unfortunately there is minimal > > > overlap in the datetime functions supported by sqlite [2] and other > > > databases [3] so I don't see a way I could GROUP BY the records by the > > > time bucket they belong in and then count them. > > > > > > Any idea how to do this portably so that a small number of queries is > > > used and the result set is of constant size? > > > > > > Thanks, > > > Martin > > > > > > [1] https://github.com/theforeman/foreman_abrt > > > [2] http://www.sqlite.org/lang_datefunc.html > > > [3] > > > http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressio > > > ns/Date_and_time_functions > > > > -- > > Marek > > > > -- > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > -- > Later, > Lukas #lzap Zapletal > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.

How's the situation with days?

Also would it make sense to extract the date on report delivery and
store it as an numeric value in a new column? Then you cold do the
calculation easily. Something like "number of days from epoch".

Or maybe to do a timestamp - that could do it as well? The calculation
would be more challenging perhaps, but doable?

Just thinking loud.

LZ

··· On Fri, Sep 19, 2014 at 11:20:37AM +0200, Martin Milata wrote: > From what I found, EXTRACT(HOUR FROM datetimecolumn) is defined in ANSI > SQL 2011 and both MySQL and Postgres support it. Only sqlite will need > special treatment. > > Martin > > On Thu, Sep 18, 2014 at 14:32:36 +0200, Lukas Zapletal wrote: > > If this is the only way of doing this (I don't know) please do it in a > > generic way and make sure we have those custom queries in one place so > > adding support for another DB would mean dropping new file somewhere (a > > class or module) and overriding few methods or hash or whatever. > > > > I am not against this solution but I would personally like to see all > > the custom queries in one place. > > > > I was looking on this last week and I also don't think this is doable > > via ANSI SQL 1999 or 2001 or whatever the latest SQL standard is today > > :-) > > > > LZ > > > > On Thu, Sep 18, 2014 at 02:23:47PM +0200, Marek Hulan wrote: > > > Hi, > > > > > > I'm afraid, you'll have to implement the GROUP BY part of SQL for every > > > adapter. I'd create a scope :group_by_week which would add :group => > > > "$custom_iplementation". You can find out current adapter using > > > > > > ActiveRecord::Base.connection_config[:adapter] > > > > > > Then following should work > > > > > > abrt_reports.where(:reported_at => t..(t+interval)).group_by_week.count > > > > > > I think we are OK with mysql, postgresql, sqlite. Ideally make the > > > implementation extendable so anyone else can add own implementation for other > > > adapters. > > > > > > Hope this helps > > > > > > -- > > > Marek > > > > > > On Wednesday 17 of September 2014 12:08:57 Martin Milata wrote: > > > > In the foreman_abrt [1] it would be useful if the plugin provided > > > > dashboard graph indicating amount of crash reports detected in some > > > > period of time. This is essentially the same as the "Run distribution > > > > in the last 30 minutes" chart, though I assume this information is > > > > interesting for a bit longer time so what I have now is "Crash report > > > > distribution in last 24 hours" with 24 buckets per 1 hour instead of 10 > > > > buckets per 3 minutes. > > > > > > > > The problem is that naive implementation generates 24 queries of the > > > > form > > > > > > > > abrt_reports.where(:reported_at => t..(t+interval)).count > > > > > > > > every time the dashboard is loaded. Unfortunately there is minimal > > > > overlap in the datetime functions supported by sqlite [2] and other > > > > databases [3] so I don't see a way I could GROUP BY the records by the > > > > time bucket they belong in and then count them. > > > > > > > > Any idea how to do this portably so that a small number of queries is > > > > used and the result set is of constant size? > > > > > > > > Thanks, > > > > Martin > > > > > > > > [1] https://github.com/theforeman/foreman_abrt > > > > [2] http://www.sqlite.org/lang_datefunc.html > > > > [3] > > > > http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressio > > > > ns/Date_and_time_functions > > > > > > -- > > > Marek > > > > > > -- > > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > -- > > Later, > > Lukas #lzap Zapletal > > > > -- > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.


Later,
Lukas #lzap Zapletal

> How's the situation with days?

As you mentioned, using DATE() seems to work on both postgres and sqlite
for grouping by days. I'm not sure if this is the right time resolution
an am a bit afraid that someone will request to change it or make it
configurable and we'll be back where we were:)

> Also would it make sense to extract the date on report delivery and
> store it as an numeric value in a new column? Then you cold do the
> calculation easily. Something like "number of days from epoch".
>
> Or maybe to do a timestamp - that could do it as well? The calculation
> would be more challenging perhaps, but doable?

Both approaches would certainly work, however they would introduce
redundancy to the database which in my opinion is worse than having
different query for sqlite.

> Just thinking loud.

I think I'll just go with the DATE() approach for now, perhaps noone
will complain after all.

Thanks,
Martin

··· On Fri, Sep 19, 2014 at 11:53:42 +0200, Lukas Zapletal wrote:

LZ

On Fri, Sep 19, 2014 at 11:20:37AM +0200, Martin Milata wrote:

From what I found, EXTRACT(HOUR FROM datetimecolumn) is defined in ANSI
SQL 2011 and both MySQL and Postgres support it. Only sqlite will need
special treatment.

Martin

On Thu, Sep 18, 2014 at 14:32:36 +0200, Lukas Zapletal wrote:

If this is the only way of doing this (I don’t know) please do it in a
generic way and make sure we have those custom queries in one place so
adding support for another DB would mean dropping new file somewhere (a
class or module) and overriding few methods or hash or whatever.

I am not against this solution but I would personally like to see all
the custom queries in one place.

I was looking on this last week and I also don’t think this is doable
via ANSI SQL 1999 or 2001 or whatever the latest SQL standard is today
:slight_smile:

LZ

On Thu, Sep 18, 2014 at 02:23:47PM +0200, Marek Hulan wrote:

Hi,

I’m afraid, you’ll have to implement the GROUP BY part of SQL for every
adapter. I’d create a scope :group_by_week which would add :group =>
"$custom_iplementation". You can find out current adapter using

ActiveRecord::Base.connection_config[:adapter]

Then following should work

abrt_reports.where(:reported_at => t…(t+interval)).group_by_week.count

I think we are OK with mysql, postgresql, sqlite. Ideally make the
implementation extendable so anyone else can add own implementation for other
adapters.

Hope this helps


Marek

On Wednesday 17 of September 2014 12:08:57 Martin Milata wrote:

In the foreman_abrt [1] it would be useful if the plugin provided
dashboard graph indicating amount of crash reports detected in some
period of time. This is essentially the same as the “Run distribution
in the last 30 minutes” chart, though I assume this information is
interesting for a bit longer time so what I have now is “Crash report
distribution in last 24 hours” with 24 buckets per 1 hour instead of 10
buckets per 3 minutes.

The problem is that naive implementation generates 24 queries of the
form

abrt_reports.where(:reported_at => t…(t+interval)).count

every time the dashboard is loaded. Unfortunately there is minimal
overlap in the datetime functions supported by sqlite [2] and other
databases [3] so I don’t see a way I could GROUP BY the records by the
time bucket they belong in and then count them.

Any idea how to do this portably so that a small number of queries is
used and the result set is of constant size?

Thanks,
Martin

[1] https://github.com/theforeman/foreman_abrt
[2] http://www.sqlite.org/lang_datefunc.html
[3]
http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressio
ns/Date_and_time_functions


Marek


You received this message because you are subscribed to the Google Groups “foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Later,
Lukas #lzap Zapletal


You received this message because you are subscribed to the Google Groups “foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups “foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Later,
Lukas #lzap Zapletal


You received this message because you are subscribed to the Google Groups “foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Here's the updated code that uses only one query and buckets the reports
by days: https://github.com/theforeman/foreman_abrt/commit/c33a36649eb030beff1f9f3f1f803d8590adc15b

There's a problem though: all crash reports are considered, not only
those from hosts matching the search query. Is there a way to select
only those AbrtReports that belong to Hosts satisfying the given search
query?

Thanks,
Martin

··· On Mon, Sep 22, 2014 at 16:26:36 +0200, Martin Milata wrote: > On Fri, Sep 19, 2014 at 11:53:42 +0200, Lukas Zapletal wrote: > > How's the situation with days? > > As you mentioned, using DATE() seems to work on both postgres and sqlite > for grouping by days. I'm not sure if this is the right time resolution > an am a bit afraid that someone will request to change it or make it > configurable and we'll be back where we were:) > > > Also would it make sense to extract the date on report delivery and > > store it as an numeric value in a new column? Then you cold do the > > calculation easily. Something like "number of days from epoch". > > > > Or maybe to do a timestamp - that could do it as well? The calculation > > would be more challenging perhaps, but doable? > > Both approaches would certainly work, however they would introduce > redundancy to the database which in my opinion is worse than having > different query for sqlite. > > > Just thinking loud. > > I think I'll just go with the DATE() approach for now, perhaps noone > will complain after all. > > Thanks, > Martin > > > LZ > > > > On Fri, Sep 19, 2014 at 11:20:37AM +0200, Martin Milata wrote: > > > From what I found, EXTRACT(HOUR FROM datetimecolumn) is defined in ANSI > > > SQL 2011 and both MySQL and Postgres support it. Only sqlite will need > > > special treatment. > > > > > > Martin > > > > > > On Thu, Sep 18, 2014 at 14:32:36 +0200, Lukas Zapletal wrote: > > > > If this is the only way of doing this (I don't know) please do it in a > > > > generic way and make sure we have those custom queries in one place so > > > > adding support for another DB would mean dropping new file somewhere (a > > > > class or module) and overriding few methods or hash or whatever. > > > > > > > > I am not against this solution but I would personally like to see all > > > > the custom queries in one place. > > > > > > > > I was looking on this last week and I also don't think this is doable > > > > via ANSI SQL 1999 or 2001 or whatever the latest SQL standard is today > > > > :-) > > > > > > > > LZ > > > > > > > > On Thu, Sep 18, 2014 at 02:23:47PM +0200, Marek Hulan wrote: > > > > > Hi, > > > > > > > > > > I'm afraid, you'll have to implement the GROUP BY part of SQL for every > > > > > adapter. I'd create a scope :group_by_week which would add :group => > > > > > "$custom_iplementation". You can find out current adapter using > > > > > > > > > > ActiveRecord::Base.connection_config[:adapter] > > > > > > > > > > Then following should work > > > > > > > > > > abrt_reports.where(:reported_at => t..(t+interval)).group_by_week.count > > > > > > > > > > I think we are OK with mysql, postgresql, sqlite. Ideally make the > > > > > implementation extendable so anyone else can add own implementation for other > > > > > adapters. > > > > > > > > > > Hope this helps > > > > > > > > > > -- > > > > > Marek > > > > > > > > > > On Wednesday 17 of September 2014 12:08:57 Martin Milata wrote: > > > > > > In the foreman_abrt [1] it would be useful if the plugin provided > > > > > > dashboard graph indicating amount of crash reports detected in some > > > > > > period of time. This is essentially the same as the "Run distribution > > > > > > in the last 30 minutes" chart, though I assume this information is > > > > > > interesting for a bit longer time so what I have now is "Crash report > > > > > > distribution in last 24 hours" with 24 buckets per 1 hour instead of 10 > > > > > > buckets per 3 minutes. > > > > > > > > > > > > The problem is that naive implementation generates 24 queries of the > > > > > > form > > > > > > > > > > > > abrt_reports.where(:reported_at => t..(t+interval)).count > > > > > > > > > > > > every time the dashboard is loaded. Unfortunately there is minimal > > > > > > overlap in the datetime functions supported by sqlite [2] and other > > > > > > databases [3] so I don't see a way I could GROUP BY the records by the > > > > > > time bucket they belong in and then count them. > > > > > > > > > > > > Any idea how to do this portably so that a small number of queries is > > > > > > used and the result set is of constant size? > > > > > > > > > > > > Thanks, > > > > > > Martin > > > > > > > > > > > > [1] https://github.com/theforeman/foreman_abrt > > > > > > [2] http://www.sqlite.org/lang_datefunc.html > > > > > > [3] > > > > > > http://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressio > > > > > > ns/Date_and_time_functions > > > > > > > > > > -- > > > > > Marek > > > > > > > > > > -- > > > > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > > > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > -- > > > > Later, > > > > Lukas #lzap Zapletal > > > > > > > > -- > > > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > -- > > Later, > > Lukas #lzap Zapletal > > > > -- > > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.

> Both approaches would certainly work, however they would introduce
> redundancy to the database which in my opinion is worse than having
> different query for sqlite.

Redundancy and Normal Forms are not dogma. There's whole category of
denormalizing things called OLAP. And if not something else, than
report/audit -like tables are good fit for this.

LZ

··· -- Later, Lukas #lzap Zapletal

> There's a problem though: all crash reports are considered, not only
> those from hosts matching the search query. Is there a way to select
> only those AbrtReports that belong to Hosts satisfying the given search
> query?

Of course in SQL, I am not sure in AR. Perhaps adding a where clause?

··· -- Later, Lukas #lzap Zapletal