Notifications for out-of-sync hosts

Bernard_Clark · July 14, 2014, 3:15pm

Hi, I know that Foreman can send an email notification when a host is in an
error state…but can it also send one when a host get out of sync?

Bernard_Clark · July 15, 2014, 3:32pm

Just to add more information: we currently use a homegrown git merge hook
in combination with nagios to alert us when a host gets out of sync. What
we would prefer is to have a native Foreman alert, like the one that sends
email when a host is in error state. Thanks in advance for any ideas…

···

On Monday, July 14, 2014 11:15:54 AM UTC-4, Bernard Clark wrote: > > Hi, I know that Foreman can send an email notification when a host is in > an error state...but can it also send one when a host get out of sync? >

Gwmngilfen · July 16, 2014, 5:23pm

The error state email is generated in response to a report coming from
Puppet - i.e there is a trigger to act on. By definition, an
out-of-sync host is one which has not had a report in a given period
of time (usually 35mins), so by definition, there is no trigger to act
on. So no, there's no way to do that.

We would have to be running some kind of internal cron every minute to
look for out of sync hosts, and that would be very wasteful. Nagios is
far better at monitoring this sort of thing, especially if you're
already using it anyway

Greg

Brian_Gupta · July 16, 2014, 10:30pm

Greg,

Your idea of an internal cron seems reminds me of some old
functionality that Bernard might be able to leverage.

See: Mail Notifications - Foreman

The wiki pages mentions a rake task for emailing report summaries:
Perhaps if the rake task could be tweaked, he could run it out of cron
every 30-60 minutes? (Assuming that he doesn't need to know the
minute it gets out of sync?)

Issues:

Although the reports:summarize task is still in the code base, I
don't know if it gets tests or contains information on out of sync
hosts.
A new task would have to be written that's based on this code
I don't know if running it every hour or so would be quick enough
notification. (I can't imagine running it every minute, just from a
load perspective.)

Cheers,
Brian

···

On Wed, Jul 16, 2014 at 1:23 PM, Greg Sutcliffe wrote: > The error state email is generated in response to a report coming from > Puppet - i.e there is a trigger to act on. By definition, an > out-of-sync host is one which has not had a report in a given period > of time (usually 35mins), so by definition, there is no trigger to act > on. So no, there's no way to do that. > > We would have to be running some kind of internal cron every minute to > look for out of sync hosts, and that would be very wasteful. Nagios is > far better at monitoring this sort of thing, especially if you're > already using it anyway :) > > Greg > > -- > You received this message because you are subscribed to the Google Groups "Foreman users" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-users+unsubscribe@googlegroups.com. > To post to this group, send email to foreman-users@googlegroups.com. > Visit this group at http://groups.google.com/group/foreman-users. > For more options, visit https://groups.google.com/d/optout.

Gwmngilfen · July 19, 2014, 2:54am

> Greg,
>
> Your idea of an internal cron seems reminds me of some old
> functionality that Bernard might be able to leverage.
>
> See: Mail Notifications - Foreman
>
> The wiki pages mentions a rake task for emailing report summaries:
> Perhaps if the rake task could be tweaked, he could run it out of cron
> every 30-60 minutes? (Assuming that he doesn't need to know the
> minute it gets out of sync?)
>
> Issues:
> 1) Although the reports:summarize task is still in the code base, I
> don't know if it gets tests or contains information on out of sync
> hosts.

Actually I think that mail does include out-of-sync hosts, IIRC. Your
mileage may vary, etc

> 2) A new task would have to be written that's based on this code

Maybe not, see (1)

> 3) I don't know if running it every hour or so would be quick enough
> notification. (I can't imagine running it every minute, just from a
> load perspective.)

That's the main issue, it would generate a lot of load to be able to
compete with Nagios' frequency of checks.

In $previous job, I used NRPE to check files in /var/lib/puppet on the
clients to alert for out of sync status - since NRPE was already in
use for other checks, it meant we added very little load for checking
this. That's the background for my earlier comment about Nagios being
the tool for the job, especially as the original post mentions already
having Nagios in place.

Greg

···

On 16 July 2014 23:30, Brian Gupta wrote: