ABRT plugin report aggregation - timer events in smart-proxy?

(For previous discussion wrt ABRT plugin see [1].)

Hi!

I'm planning to implement aggregation in the patch that adds ABRT
support to smart-proxy [2]. It is needed to prevent the proxies from
DDoSing the foreman in the situation when lots of hosts send lots of
reports in a short time span.

One way to accomplish this is to limit how often the proxy can forward
the report to the foreman server. Say we allow one request every N
seconds/minutes. When this limit is exceeded, the report is stored and
sent later within the rate limit, possibly together with other stored
reports.

I'd like to ask for some advice:

  • How can I implement it in the smart-proxy? Sending the stored reports
    has to be hooked into some kind of timer event - for example, first
    report arrives and is immediately forwarded, then second report
    arrives shortly which is stored, no other report arrives after that -
    how can we make sure the second report is eventually forwarded too? Is
    this somehow possible in Sinatra?

  • What is a reasonable time span between making requests to foreman
    server?

  • Should the reports be stored on disk or in memory?

Cheers,
Martin

[1] https://groups.google.com/forum/#!topic/foreman-dev/CQn-oHB_jus
[2] https://github.com/mmilata/smart-proxy/tree/abrtproxy-squashed

> - How can I implement it in the smart-proxy? Sending the stored reports
> has to be hooked into some kind of timer event - for example, first
> report arrives and is immediately forwarded, then second report
> arrives shortly which is stored, no other report arrives after that -
> how can we make sure the second report is eventually forwarded too? Is
> this somehow possible in Sinatra?

I guess Sinatra is just a web framework and we run that in Webrick
currently which is not a complete application platform, but rather a
simple http server.

I don't think we should be spawning threads and doing other magic. Even
if this works fine under Webrick, we might want to migrate to different
stack in the future.

> - What is a reasonable time span between making requests to foreman
> server?
>
> - Should the reports be stored on disk or in memory?

I think the easiest integration is to append reports in a (logrotated)
journal (log file if you will). Appending to a file is fast operation
and will not block working thread.

Report sending could be done in a different process, or even a script
called by cron every X minutes. I guess the sane default could be 10
minutes. You only need to remember last report sent. The script could
support both (operation as daemon and from cron) and we can create a
puppet module to deploy one of these options by default.

Long-term goal is maybe to have a database installed with a smart-proxy.
Then it will be easy to reimplement this to queue the data in a SQL
database. But I still think file append is much faster than writing
anything to a database.

Note I assume you do not need to keep any stacktrace hashes or anything
like that in memory. I am hoping smart-proxy do not need to do any kind
of calculations or similarity comparisons.

My 2 cents.

··· -- Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman

> > - How can I implement it in the smart-proxy? Sending the stored reports
> > has to be hooked into some kind of timer event - for example, first
> > report arrives and is immediately forwarded, then second report
> > arrives shortly which is stored, no other report arrives after that -
> > how can we make sure the second report is eventually forwarded too? Is
> > this somehow possible in Sinatra?
>
> I guess Sinatra is just a web framework and we run that in Webrick
> currently which is not a complete application platform, but rather a
> simple http server.
>
> I don't think we should be spawning threads and doing other magic. Even
> if this works fine under Webrick, we might want to migrate to different
> stack in the future.

I don't think updating the thread spawning to different platform, ones
the time comes, would be that hard. But there are other advantages
of having the uReports stored on disk and handled in separate script
though: on is able to see immediately what's waiting in the queue, and
the timing customized quite easily with the cron.

>
> > - What is a reasonable time span between making requests to foreman
> > server?
> >
> > - Should the reports be stored on disk or in memory?
>
> I think the easiest integration is to append reports in a (logrotated)
> journal (log file if you will). Appending to a file is fast operation
> and will not block working thread.
>
> Report sending could be done in a different process, or even a script
> called by cron every X minutes. I guess the sane default could be 10
> minutes. You only need to remember last report sent. The script could
> support both (operation as daemon and from cron) and we can create a
> puppet module to deploy one of these options by default.
>
> Long-term goal is maybe to have a database installed with a smart-proxy.
> Then it will be easy to reimplement this to queue the data in a SQL
> database. But I still think file append is much faster than writing
> anything to a database.

Yeah, file append web-scales for sure :slight_smile: However, I'm not entirely sure
how "thread-safe" is to write to file from one process, while reading it from
other process every now and then…

– Ivan

··· ----- Original Message -----

Note I assume you do not need to keep any stacktrace hashes or anything
like that in memory. I am hoping smart-proxy do not need to do any kind
of calculations or similarity comparisons.

My 2 cents.


Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

> > - How can I implement it in the smart-proxy? Sending the stored reports
> > has to be hooked into some kind of timer event - for example, first
> > report arrives and is immediately forwarded, then second report
> > arrives shortly which is stored, no other report arrives after that -
> > how can we make sure the second report is eventually forwarded too? Is
> > this somehow possible in Sinatra?
>
> I guess Sinatra is just a web framework and we run that in Webrick
> currently which is not a complete application platform, but rather a
> simple http server.
>
> I don't think we should be spawning threads and doing other magic. Even
> if this works fine under Webrick, we might want to migrate to different
> stack in the future.

Fair enough.

> > - What is a reasonable time span between making requests to foreman
> > server?
> >
> > - Should the reports be stored on disk or in memory?
>
> I think the easiest integration is to append reports in a (logrotated)
> journal (log file if you will). Appending to a file is fast operation
> and will not block working thread.
>
> Report sending could be done in a different process, or even a script
> called by cron every X minutes. I guess the sane default could be 10
> minutes. You only need to remember last report sent. The script could
> support both (operation as daemon and from cron) and we can create a
> puppet module to deploy one of these options by default.

That sounds reasonable, I'll look into this possibility.

> Long-term goal is maybe to have a database installed with a smart-proxy.
> Then it will be easy to reimplement this to queue the data in a SQL
> database. But I still think file append is much faster than writing
> anything to a database.

If speed is a concern, is it OK to make requests over the network? I
assume thay may take much longer than e.g. writing to disk.

> Note I assume you do not need to keep any stacktrace hashes or anything
> like that in memory. I am hoping smart-proxy do not need to do any kind
> of calculations or similarity comparisons.

Computing stacktrace hashes can be done in the separate process, just
before sending, it should be sufficient for smart-proxy to just write
them.

Thanks,
Martin

··· On Wed, Mar 12, 2014 at 09:52:12 +0100, Lukas Zapletal wrote:

We might need to use some kind of locking. I wonder if File#flock is
sufficient for this and portable enough?

Thanks,
Martin

··· On Wed, Mar 12, 2014 at 05:34:31 -0400, Ivan Necas wrote: > Yeah, file append web-scales for sure :) However, I'm not entirely sure > how "thread-safe" is to write to file from one process, while reading it from > other process every now and then…

There is some filelocking code in the proxy already - feel free to
flesh it out further :slight_smile:

··· On 12 March 2014 15:11, Martin Milata wrote: > On Wed, Mar 12, 2014 at 05:34:31 -0400, Ivan Necas wrote: >> Yeah, file append web-scales for sure :) However, I'm not entirely sure >> how "thread-safe" is to write to file from one process, while reading it from >> other process every now and then... > > We might need to use some kind of locking. I wonder if File#flock is > sufficient for this and portable enough?

> We might need to use some kind of locking. I wonder if File#flock is
> sufficient for this and portable enough?

Webrick is one process - one thread and O_APPEND should not give any
unexperienced results on Linux for this scenario, but yeah we support
Windows too.

Flock it :slight_smile:

I'd do LOCK_EX just to be sure.

LZ

··· -- Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman

is this recreating any existing code in any existing products already?

  • -bk
··· On 03/12/2014 11:07 AM, Martin Milata wrote: > On Wed, Mar 12, 2014 at 09:52:12 +0100, Lukas Zapletal wrote: >>> - How can I implement it in the smart-proxy? Sending the stored reports >>> has to be hooked into some kind of timer event - for example, first >>> report arrives and is immediately forwarded, then second report >>> arrives shortly which is stored, no other report arrives after that - >>> how can we make sure the second report is eventually forwarded too? Is >>> this somehow possible in Sinatra? >> >> I guess Sinatra is just a web framework and we run that in Webrick >> currently which is not a complete application platform, but rather a >> simple http server. >> >> I don't think we should be spawning threads and doing other magic. Even >> if this works fine under Webrick, we might want to migrate to different >> stack in the future. > > Fair enough. > >>> - What is a reasonable time span between making requests to foreman >>> server? >>> >>> - Should the reports be stored on disk or in memory? >> >> I think the easiest integration is to append reports in a (logrotated) >> journal (log file if you will). Appending to a file is fast operation >> and will not block working thread. >> >> Report sending could be done in a different process, or even a script >> called by cron every X minutes. I guess the sane default could be 10 >> minutes. You only need to remember last report sent. The script could >> support both (operation as daemon and from cron) and we can create a >> puppet module to deploy one of these options by default. > > That sounds reasonable, I'll look into this possibility. > >> Long-term goal is maybe to have a database installed with a smart-proxy. >> Then it will be easy to reimplement this to queue the data in a SQL >> database. But I still think file append is much faster than writing >> anything to a database. > > If speed is a concern, is it OK to make requests over the network? I > assume thay may take much longer than e.g. writing to disk. > >> Note I assume you do not need to keep any stacktrace hashes or anything >> like that in memory. I am hoping smart-proxy do not need to do any kind >> of calculations or similarity comparisons. > > Computing stacktrace hashes can be done in the separate process, just > before sending, it should be sufficient for smart-proxy to just write > them. > > Thanks, > Martin >

> is this recreating any existing code in any existing products already?

Yep, Martin wrote Ruby bindings for a library that comes with ABRT.
Smart proxy will use that.

··· -- Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman

I'm not sure if I understand the question. The ABRT server currently
doesn't have this functionality. And as Ivan said, computing stacktrace
hashes is handled by the same library the server uses.

Martin

··· On Wed, Mar 12, 2014 at 17:51:59 -0400, Bryan Kearney wrote: > On 03/12/2014 11:07 AM, Martin Milata wrote: > >On Wed, Mar 12, 2014 at 09:52:12 +0100, Lukas Zapletal wrote: > >>>- How can I implement it in the smart-proxy? Sending the stored reports > >>> has to be hooked into some kind of timer event - for example, first > >>> report arrives and is immediately forwarded, then second report > >>> arrives shortly which is stored, no other report arrives after that - > >>> how can we make sure the second report is eventually forwarded too? Is > >>> this somehow possible in Sinatra? > >> > >>I guess Sinatra is just a web framework and we run that in Webrick > >>currently which is not a complete application platform, but rather a > >>simple http server. > >> > >>I don't think we should be spawning threads and doing other magic. Even > >>if this works fine under Webrick, we might want to migrate to different > >>stack in the future. > > > >Fair enough. > > > >>>- What is a reasonable time span between making requests to foreman > >>> server? > >>> > >>>- Should the reports be stored on disk or in memory? > >> > >>I think the easiest integration is to append reports in a (logrotated) > >>journal (log file if you will). Appending to a file is fast operation > >>and will not block working thread. > >> > >>Report sending could be done in a different process, or even a script > >>called by cron every X minutes. I guess the sane default could be 10 > >>minutes. You only need to remember last report sent. The script could > >>support both (operation as daemon and from cron) and we can create a > >>puppet module to deploy one of these options by default. > > > >That sounds reasonable, I'll look into this possibility. > > > >>Long-term goal is maybe to have a database installed with a smart-proxy. > >>Then it will be easy to reimplement this to queue the data in a SQL > >>database. But I still think file append is much faster than writing > >>anything to a database. > > > >If speed is a concern, is it OK to make requests over the network? I > >assume thay may take much longer than e.g. writing to disk. > > > >>Note I assume you do not need to keep any stacktrace hashes or anything > >>like that in memory. I am hoping smart-proxy do not need to do any kind > >>of calculations or similarity comparisons. > > > >Computing stacktrace hashes can be done in the separate process, just > >before sending, it should be sufficient for smart-proxy to just write > >them. > > > >Thanks, > >Martin > > > is this recreating any existing code in any existing products already?

I was curious if the current ABRT server did this. Thanks!

– bk

··· On 03/14/2014 06:32 AM, Martin Milata wrote: > On Wed, Mar 12, 2014 at 17:51:59 -0400, Bryan Kearney wrote: >> On 03/12/2014 11:07 AM, Martin Milata wrote: >>> On Wed, Mar 12, 2014 at 09:52:12 +0100, Lukas Zapletal wrote: >>>>> - How can I implement it in the smart-proxy? Sending the stored reports >>>>> has to be hooked into some kind of timer event - for example, first >>>>> report arrives and is immediately forwarded, then second report >>>>> arrives shortly which is stored, no other report arrives after that - >>>>> how can we make sure the second report is eventually forwarded too? Is >>>>> this somehow possible in Sinatra? >>>> >>>> I guess Sinatra is just a web framework and we run that in Webrick >>>> currently which is not a complete application platform, but rather a >>>> simple http server. >>>> >>>> I don't think we should be spawning threads and doing other magic. Even >>>> if this works fine under Webrick, we might want to migrate to different >>>> stack in the future. >>> >>> Fair enough. >>> >>>>> - What is a reasonable time span between making requests to foreman >>>>> server? >>>>> >>>>> - Should the reports be stored on disk or in memory? >>>> >>>> I think the easiest integration is to append reports in a (logrotated) >>>> journal (log file if you will). Appending to a file is fast operation >>>> and will not block working thread. >>>> >>>> Report sending could be done in a different process, or even a script >>>> called by cron every X minutes. I guess the sane default could be 10 >>>> minutes. You only need to remember last report sent. The script could >>>> support both (operation as daemon and from cron) and we can create a >>>> puppet module to deploy one of these options by default. >>> >>> That sounds reasonable, I'll look into this possibility. >>> >>>> Long-term goal is maybe to have a database installed with a smart-proxy. >>>> Then it will be easy to reimplement this to queue the data in a SQL >>>> database. But I still think file append is much faster than writing >>>> anything to a database. >>> >>> If speed is a concern, is it OK to make requests over the network? I >>> assume thay may take much longer than e.g. writing to disk. >>> >>>> Note I assume you do not need to keep any stacktrace hashes or anything >>>> like that in memory. I am hoping smart-proxy do not need to do any kind >>>> of calculations or similarity comparisons. >>> >>> Computing stacktrace hashes can be done in the separate process, just >>> before sending, it should be sufficient for smart-proxy to just write >>> them. >>> >>> Thanks, >>> Martin >>> >> is this recreating any existing code in any existing products already? > > I'm not sure if I understand the question. The ABRT server currently > doesn't have this functionality. And as Ivan said, computing stacktrace > hashes is handled by the same library the server uses. > > Martin >