Foreman instrumentation for telemetry proposal

lzap · November 20, 2017, 2:46pm

Hey,

on the last demo I presented my proposal for telemetry (it is actually
a separate video). I am looking for non-intrusive approach with broad
integration possibilities:

This was also showed on our demo last week (the same content):

I am starting this thread to gather feedback before I open a PR with
this. Currently the code is mostly in Rails initializer and looks like
this:

# get telemetry singleton instance and setup it
telemetry = Foreman::Telemetry.instance.setup(... some options ...)

# register measurements
telemetry.add_counter(:http_requests, 'A counter of HTTP requests
made', [:controller, :action])
telemetry.add_histogram(:http_request_total_duration, 'Total
duration', [:controller, :action])
telemetry.add_counter(:activerecord_instances, 'Number of instances of
AR models', [:class])

send measurements from Rails instrumentation or from code base

telemetry.increment_counter(:http_requests, 1, :controller =>
controller, :action => action, :status => status)
telemetry.observe_histogram(:http_request_total_duration, duration,
:controller => controller, :action => action)

The proposed API is a single class (a singleton actually) with three
registering methods and three measure methods. I don’t think such a
simple class needs proper separation of concerns, but we can talk
about this in the PR. The registration part could be turned into some
kind of DSL, currently it takes metric name, description and list of
keys which will be part of an instance for those frameworks which do
not support arbitrary amount of key-value pairs.

If there are no objections, I will add settings and better error
handling and file the PR.

Bryan_Kearney · November 20, 2017, 3:31pm

How would folks disable it opt out of sending this data?

···

On Nov 20, 2017 9:46 AM, "Lukas Zapletal" <lzap@redhat.com> wrote:

Hey,

on the last demo I presented my proposal for telemetry (it is actually
a separate video). I am looking for non-intrusive approach with broad
integration possibilities:

This was also showed on our demo last week (the same content):

I am starting this thread to gather feedback before I open a PR with
this. Currently the code is mostly in Rails initializer and looks like
this:

# get telemetry singleton instance and setup it
telemetry = Foreman::Telemetry.instance.setup(... some options ...)

# register measurements
telemetry.add_counter(:http_requests, 'A counter of HTTP requests
made', [:controller, :action])
telemetry.add_histogram(:http_request_total_duration, 'Total
duration', [:controller, :action])
telemetry.add_counter(:activerecord_instances, 'Number of instances of
AR models', [:class])

# send measurements from Rails instrumentation or from code base
telemetry.increment_counter(:http_requests, 1, :controller =>
controller, :action => action, :status => status)
telemetry.observe_histogram(:http_request_total_duration, duration,
:controller => controller, :action => action)

The proposed API is a single class (a singleton actually) with three
registering methods and three measure methods. I don't think such a
simple class needs proper separation of concerns, but we can talk
about this in the PR. The registration part could be turned into some
kind of DSL, currently it takes metric name, description and list of
keys which will be part of an instance for those frameworks which do
not support arbitrary amount of key-value pairs.

If there are no objections, I will add settings and better error
handling and file the PR.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lzap · November 21, 2017, 12:58pm

Thanks for question, this will be completely opt-in via
/etc/foreman/settings.yaml. You can turn off (default behavior when not set), or on via prometheus, statsd or logging implementation (for debugging purposes - sends stats to Rails log / production.log).

···

On Mon, Nov 20, 2017 at 4:31 PM, Bryan Kearney <bryan.kearney@gmail.com> wrote:

How would folks disable it opt out of sending this data?

On Nov 20, 2017 9:46 AM, "Lukas Zapletal" <lzap@redhat.com> wrote:

Hey,

on the last demo I presented my proposal for telemetry (it is actually
a separate video). I am looking for non-intrusive approach with broad
integration possibilities:

This was also showed on our demo last week (the same content):

I am starting this thread to gather feedback before I open a PR with
this. Currently the code is mostly in Rails initializer and looks like
this:

# get telemetry singleton instance and setup it
telemetry = Foreman::Telemetry.instance.setup(... some options ...)

# register measurements
telemetry.add_counter(:http_requests, 'A counter of HTTP requests
made', [:controller, :action])
telemetry.add_histogram(:http_request_total_duration, 'Total
duration', [:controller, :action])
telemetry.add_counter(:activerecord_instances, 'Number of instances of
AR models', [:class])

# send measurements from Rails instrumentation or from code base
telemetry.increment_counter(:http_requests, 1, :controller =>
controller, :action => action, :status => status)
telemetry.observe_histogram(:http_request_total_duration, duration,
:controller => controller, :action => action)

The proposed API is a single class (a singleton actually) with three
registering methods and three measure methods. I don't think such a
simple class needs proper separation of concerns, but we can talk
about this in the PR. The registration part could be turned into some
kind of DSL, currently it takes metric name, description and list of
keys which will be part of an instance for those frameworks which do
not support arbitrary amount of key-value pairs.

If there are no objections, I will add settings and better error
handling and file the PR.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Later,
Lukas @lzap Zapletal

iNecas · November 24, 2017, 3:10pm

Glad to see this moving forward. Few questions:

1. what happened to the PCP approach we talked about in the past?

2. how would you integrate this to sosreport/foreman-debug? I'm thinking of storing the statsd data locally, collecting them with foreman-debug, and then, being able to import them later to the prometheus and other tools. Is this how this could work? Any other options?

3. does every host/runtime needs it's own statsd service, or there would be one shared process? Asking bith for multi-host and containers use-case

The proposal of the telemetry api itself seems reasonable, let's discuss that on an actual PR

-- Ivan

···

On Tue, 21 Nov 2017 at 13:59, Lukas Zapletal <lzap@redhat.com> wrote:

Thanks for question, this will be completely opt-in via
/etc/foreman/settings.yaml. You can turn off (default behavior when
not set), or on via prometheus, statsd or logging implementation (for
debugging purposes - sends stats to Rails log / production.log).

On Mon, Nov 20, 2017 at 4:31 PM, Bryan Kearney <bryan.kearney@gmail.com> > wrote:
How would folks disable it opt out of sending this data?

On Nov 20, 2017 9:46 AM, "Lukas Zapletal" <lzap@redhat.com> wrote:

Hey,

on the last demo I presented my proposal for telemetry (it is actually
a separate video). I am looking for non-intrusive approach with broad
integration possibilities:

This was also showed on our demo last week (the same content):

I am starting this thread to gather feedback before I open a PR with
this. Currently the code is mostly in Rails initializer and looks like
this:

# get telemetry singleton instance and setup it
telemetry = Foreman::Telemetry.instance.setup(... some options ...)

# register measurements
telemetry.add_counter(:http_requests, 'A counter of HTTP requests
made', [:controller, :action])
telemetry.add_histogram(:http_request_total_duration, 'Total
duration', [:controller, :action])
telemetry.add_counter(:activerecord_instances, 'Number of instances of
AR models', [:class])

# send measurements from Rails instrumentation or from code base
telemetry.increment_counter(:http_requests, 1, :controller =>
controller, :action => action, :status => status)
telemetry.observe_histogram(:http_request_total_duration, duration,
:controller => controller, :action => action)

The proposed API is a single class (a singleton actually) with three
registering methods and three measure methods. I don't think such a
simple class needs proper separation of concerns, but we can talk
about this in the PR. The registration part could be turned into some
kind of DSL, currently it takes metric name, description and list of
keys which will be part of an instance for those frameworks which do
not support arbitrary amount of key-value pairs.

If there are no objections, I will add settings and better error
handling and file the PR.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google
Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lzap · November 24, 2017, 5:54pm

1. what happened to the PCP approach we talked about in the past?

Thats going in parallel, PCP is just a monitoring framework you can integrate with instrumentation data just like any other.

2. how would you integrate this to sosreport/foreman-debug? I'm thinking of
storing the statsd data locally, collecting them with foreman-debug, and
then, being able to import them later to the prometheus and other tools. Is
this how this could work? Any other options?

This is my ultimate goal to have working PCP deployment including telemetry data and archives could be collected by foreman-debug, they are pretty small (few MBs per day).

3. does every host/runtime needs it's own statsd service, or there would be
one shared process? Asking bith for multi-host and containers use-case

It is up to you if you want one statsd service per guest/container, host or subnet. Prometheus endpoint will not require any external daemon once sharing metrics is merged into upstream. For this reason, statsd will server as a temporary solution and alternative for the future.

The proposal of the telemetry api itself seems reasonable, let's discuss
that on an actual PR

Thanks, I hope to finish it this year.

···

--
Later,
Lukas @lzap Zapletal

thomasmckay · December 14, 2017, 2:26pm

Openshift uses Prometheus[1] which seems very similar and compatible with
your ideas. Is that something you've looked at already? If/when foreman is
containerized and perhaps run under kubernetes your work could be very
useful as well.

https://blog.openshift.com/tag/prometheus/

···

On Fri, Nov 24, 2017 at 12:54 PM, Lukas Zapletal <lzap@redhat.com> wrote:

> 1. what happened to the PCP approach we talked about in the past?

Thats going in parallel, PCP is just a monitoring framework you can
integrate with instrumentation data just like any other.

> 2. how would you integrate this to sosreport/foreman-debug? I'm thinking
of
> storing the statsd data locally, collecting them with foreman-debug, and
> then, being able to import them later to the prometheus and other tools.
Is
> this how this could work? Any other options?

This is my ultimate goal to have working PCP deployment including
telemetry data and archives could be collected by foreman-debug, they
are pretty small (few MBs per day).

> 3. does every host/runtime needs it's own statsd service, or there would
be
> one shared process? Asking bith for multi-host and containers use-case

It is up to you if you want one statsd service per guest/container,
host or subnet. Prometheus endpoint will not require any external
daemon once sharing metrics is merged into upstream. For this reason,
statsd will server as a temporary solution and alternative for the
future.

> The proposal of the telemetry api itself seems reasonable, let's discuss
> that on an actual PR

Thanks, I hope to finish it this year.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

thomasmckay · December 14, 2017, 2:52pm

Ooops! I should have watched your video first. Watching it now. "Proposal
to integrate Prometheus and Statsd instrumentation libraries into Foreman
..."

···

On Thu, Dec 14, 2017 at 9:26 AM, Tom McKay <thomasmckay@redhat.com> wrote:

Openshift uses Prometheus[1] which seems very similar and compatible with
your ideas. Is that something you've looked at already? If/when foreman is
containerized and perhaps run under kubernetes your work could be very
useful as well.

https://blog.openshift.com/tag/prometheus/

On Fri, Nov 24, 2017 at 12:54 PM, Lukas Zapletal <lzap@redhat.com> wrote:

> 1. what happened to the PCP approach we talked about in the past?

Thats going in parallel, PCP is just a monitoring framework you can
integrate with instrumentation data just like any other.

> 2. how would you integrate this to sosreport/foreman-debug? I'm
thinking of
> storing the statsd data locally, collecting them with foreman-debug, and
> then, being able to import them later to the prometheus and other
tools. Is
> this how this could work? Any other options?

This is my ultimate goal to have working PCP deployment including
telemetry data and archives could be collected by foreman-debug, they
are pretty small (few MBs per day).

> 3. does every host/runtime needs it's own statsd service, or there
would be
> one shared process? Asking bith for multi-host and containers use-case

It is up to you if you want one statsd service per guest/container,
host or subnet. Prometheus endpoint will not require any external
daemon once sharing metrics is merged into upstream. For this reason,
statsd will server as a temporary solution and alternative for the
future.

> The proposal of the telemetry api itself seems reasonable, let's discuss
> that on an actual PR

Thanks, I hope to finish it this year.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lzap · December 14, 2017, 3:28pm

Exactly, I based my proposal on Prometheus and Statsd. You can choose.

I am going to work on my PR next week hopefully, but you can test it today:

gist.github.com

https://gist.github.com/lzap/2dfdd4dea29786a837cb1b7feb7862fd

patching.md

Patch the foreman app (only one changed file, the rest are new files):

    PATCH=/root/prom-$(date +"%s").patch
    curl -k https://github.com/lzap/foreman/commit/prometheus1.patch -o $PATCH
    pushd /usr/share/foreman; patch -p1 < $PATCH; popd

For your info, to revert a patch use:

    pushd /usr/share/foreman; patch -R -p1 < /root/prom-TIMESTAMP.patch; popd

This file has been truncated. show original

Thanks for feedback!

···

On Thu, Dec 14, 2017 at 3:52 PM, Tom McKay <thomasmckay@redhat.com> wrote:

Ooops! I should have watched your video first. Watching it now. "Proposal
to integrate Prometheus and Statsd instrumentation libraries into Foreman
..."

On Thu, Dec 14, 2017 at 9:26 AM, Tom McKay <thomasmckay@redhat.com> wrote:

Openshift uses Prometheus[1] which seems very similar and compatible with
your ideas. Is that something you've looked at already? If/when foreman is
containerized and perhaps run under kubernetes your work could be very
useful as well.

https://blog.openshift.com/tag/prometheus/

On Fri, Nov 24, 2017 at 12:54 PM, Lukas Zapletal <lzap@redhat.com> wrote:

> 1. what happened to the PCP approach we talked about in the past?

Thats going in parallel, PCP is just a monitoring framework you can
integrate with instrumentation data just like any other.

> 2. how would you integrate this to sosreport/foreman-debug? I'm
> thinking of
> storing the statsd data locally, collecting them with foreman-debug,
> and
> then, being able to import them later to the prometheus and other
> tools. Is
> this how this could work? Any other options?

This is my ultimate goal to have working PCP deployment including
telemetry data and archives could be collected by foreman-debug, they
are pretty small (few MBs per day).

> 3. does every host/runtime needs it's own statsd service, or there
> would be
> one shared process? Asking bith for multi-host and containers use-case

It is up to you if you want one statsd service per guest/container,
host or subnet. Prometheus endpoint will not require any external
daemon once sharing metrics is merged into upstream. For this reason,
statsd will server as a temporary solution and alternative for the
future.

> The proposal of the telemetry api itself seems reasonable, let's
> discuss
> that on an actual PR

Thanks, I hope to finish it this year.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Later,
Lukas @lzap Zapletal

ohadlevy · December 14, 2017, 3:30pm

Exactly, I based my proposal on Prometheus and Statsd. You can choose.

I am going to work on my PR next week hopefully, but you can test it today:

https://gist.github.com/lzap/2dfdd4dea29786a837cb1b7feb7862fd

Don't you have a docker container somewhere all setup?

···

On Thu, Dec 14, 2017 at 5:28 PM, Lukas Zapletal <lzap@redhat.com> wrote:

Thanks for feedback!

On Thu, Dec 14, 2017 at 3:52 PM, Tom McKay <thomasmckay@redhat.com> wrote:
> Ooops! I should have watched your video first. Watching it now.
"Proposal
> to integrate Prometheus and Statsd instrumentation libraries into Foreman
> ..."
>
> On Thu, Dec 14, 2017 at 9:26 AM, Tom McKay <thomasmckay@redhat.com> > wrote:
>>
>> Openshift uses Prometheus[1] which seems very similar and compatible
with
>> your ideas. Is that something you've looked at already? If/when foreman
is
>> containerized and perhaps run under kubernetes your work could be very
>> useful as well.
>>
>> https://blog.openshift.com/tag/prometheus/
>>
>>
>>
>> On Fri, Nov 24, 2017 at 12:54 PM, Lukas Zapletal <lzap@redhat.com> > wrote:
>>>
>>> > 1. what happened to the PCP approach we talked about in the past?
>>>
>>> Thats going in parallel, PCP is just a monitoring framework you can
>>> integrate with instrumentation data just like any other.
>>>
>>> > 2. how would you integrate this to sosreport/foreman-debug? I'm
>>> > thinking of
>>> > storing the statsd data locally, collecting them with foreman-debug,
>>> > and
>>> > then, being able to import them later to the prometheus and other
>>> > tools. Is
>>> > this how this could work? Any other options?
>>>
>>> This is my ultimate goal to have working PCP deployment including
>>> telemetry data and archives could be collected by foreman-debug, they
>>> are pretty small (few MBs per day).
>>>
>>> > 3. does every host/runtime needs it's own statsd service, or there
>>> > would be
>>> > one shared process? Asking bith for multi-host and containers
use-case
>>>
>>> It is up to you if you want one statsd service per guest/container,
>>> host or subnet. Prometheus endpoint will not require any external
>>> daemon once sharing metrics is merged into upstream. For this reason,
>>> statsd will server as a temporary solution and alternative for the
>>> future.
>>>
>>> > The proposal of the telemetry api itself seems reasonable, let's
>>> > discuss
>>> > that on an actual PR
>>>
>>> Thanks, I hope to finish it this year.
>>>
>>> --
>>> Later,
>>> Lukas @lzap Zapletal
>>>
>>> --
>>> You received this message because you are subscribed to the Google
Groups
>>> "foreman-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
an
>>> email to foreman-dev+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "foreman-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

thomasmckay · December 14, 2017, 3:45pm

Run prometheus container locally
$ docker run -d -p 9090:9090
registry.access.redhat.com/openshift3/prometheus
The prometheus binary is set as the entrypoint for the container
$ docker run -d prom/prometheus --help

(Images also available on docker hub prom/prometheus)

···

On Thu, Dec 14, 2017 at 10:30 AM, Ohad Levy <ohadlevy@gmail.com> wrote:

On Thu, Dec 14, 2017 at 5:28 PM, Lukas Zapletal <lzap@redhat.com> wrote:

Exactly, I based my proposal on Prometheus and Statsd. You can choose.

I am going to work on my PR next week hopefully, but you can test it
today:

https://gist.github.com/lzap/2dfdd4dea29786a837cb1b7feb7862fd

Don't you have a docker container somewhere all setup?

Thanks for feedback!

On Thu, Dec 14, 2017 at 3:52 PM, Tom McKay <thomasmckay@redhat.com> >> wrote:
> Ooops! I should have watched your video first. Watching it now.
"Proposal
> to integrate Prometheus and Statsd instrumentation libraries into
Foreman
> ..."
>
> On Thu, Dec 14, 2017 at 9:26 AM, Tom McKay <thomasmckay@redhat.com> >> wrote:
>>
>> Openshift uses Prometheus[1] which seems very similar and compatible
with
>> your ideas. Is that something you've looked at already? If/when
foreman is
>> containerized and perhaps run under kubernetes your work could be very
>> useful as well.
>>
>> https://blog.openshift.com/tag/prometheus/
>>
>>
>>
>> On Fri, Nov 24, 2017 at 12:54 PM, Lukas Zapletal <lzap@redhat.com> >> wrote:
>>>
>>> > 1. what happened to the PCP approach we talked about in the past?
>>>
>>> Thats going in parallel, PCP is just a monitoring framework you can
>>> integrate with instrumentation data just like any other.
>>>
>>> > 2. how would you integrate this to sosreport/foreman-debug? I'm
>>> > thinking of
>>> > storing the statsd data locally, collecting them with foreman-debug,
>>> > and
>>> > then, being able to import them later to the prometheus and other
>>> > tools. Is
>>> > this how this could work? Any other options?
>>>
>>> This is my ultimate goal to have working PCP deployment including
>>> telemetry data and archives could be collected by foreman-debug, they
>>> are pretty small (few MBs per day).
>>>
>>> > 3. does every host/runtime needs it's own statsd service, or there
>>> > would be
>>> > one shared process? Asking bith for multi-host and containers
use-case
>>>
>>> It is up to you if you want one statsd service per guest/container,
>>> host or subnet. Prometheus endpoint will not require any external
>>> daemon once sharing metrics is merged into upstream. For this reason,
>>> statsd will server as a temporary solution and alternative for the
>>> future.
>>>
>>> > The proposal of the telemetry api itself seems reasonable, let's
>>> > discuss
>>> > that on an actual PR
>>>
>>> Thanks, I hope to finish it this year.
>>>
>>> --
>>> Later,
>>> Lukas @lzap Zapletal
>>>
>>> --
>>> You received this message because you are subscribed to the Google
Groups
>>> "foreman-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it,
send an
>>> email to foreman-dev+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
> --
> You received this message because you are subscribed to the Google
Groups
> "foreman-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
an
> email to foreman-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Later,
Lukas @lzap Zapletal

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lzap · December 18, 2017, 1:25pm

The PR is finally up for review.

github.com/theforeman/foreman

Fixes #18675 - telemetry foreman API

theforeman:develop ← lzap:prometheus1

opened 01:24PM - 18 Dec 17 UTC

lzap

+441 -8

Foreman telemetry core API. This PR brings a common telemetry API which provides… three generic metric types: * **counter** (monotony increasing counter - for example number of http requests) * **gauge** (arbitrary float value - for example number of jobs queued) * **histogram** (arbitrary amount of buckets with observations and float values - for example duration of a request) Each individual metric must be predefined in config/initializers/5_telemetry_metrics.rb file in this format: telemetry.add_counter(:http_requests, 'A counter of HTTP requests made', [:controller, :action]) Optionally buckets can be defined for histograms (a sane default is used if not defined): telemetry.add_histogram(:http_request_total_duration, 'Total duration of controller action', [:controller, :action], [5, 10, 50, 200, 1000]) For more about why I picked these, watch my proposal video: https://www.youtube.com/watch?v=gCLSI9-4QpE There are two implementations for the best possible integration options: Prometheus endpoint and Statsd packets. To configure the former, add this to your settings: ```yaml :telemetry: :type: 'prometheus' :prefix: 'fm_rails' ``` Then visit `/metrics` to see aggregated data. Warning - due to limitation of Prometheus Ruby library, this will not work for multi-process web servers (our case) and this will provide you incorrect data. Therefore, until this is fixed upstream, a recommended way is to use Statsd: ```yaml :telemetry: :type: 'statsd' :prefix: 'fm_rails' :statsd: :host: '127.0.0.1:8125' :protocol: 'statsd' ``` Then download and run any statsd aggregator on localhost port 8125. I tested with statsite and statsd_exporter for Prometheus, which is super easy to use: ``` ./statsd_exporter -statsd.listen-udp :8125 ``` This PR provides a rake task that will create useful mapping for statsd_exporter which makes life much more easier and provides label mapping: ``` be rake telemetry:prometheus_statsd output=/tmp/mapping.yaml ./statsd_exporter -statsd.listen-udp :8125 -statsd.mapping-config /tmp/mapping.yaml ``` The last step is to grab the data into some monitoring framework, Prometheus is very easy to use as it's a single binary and configuration is easy: ``` scrape_configs: - job_name: 'foreman' static_configs: - targets: ['localhost:9102'] ./prometheus ``` This PR adds bunch of basic telemetry data, we can add more later on via initializer, this could be DSL instead of direct Ruby code (config/initializers/5_telemetry_metrics.rb). The PR currently adds: | Metric name | Labels | Type | Description | | ----------- | ------ | ---- | ----------- | | fm_rails_activerecord_instances | class | counter | Number of instances of ActiveRecord models | | fm_rails_bruteforce_locked_ui_logins | | counter | Number of blocked logins via bruteforce protection | | fm_rails_failed_ui_logins | | counter | Number of failed logins in total | | fm_rails_http_request_db_duration | controller,action | histogram | Time spent in database for a request | | fm_rails_http_request_total_duration | controller,action | histogram | Total duration of controller action | | fm_rails_http_request_view_duration | controller,action | histogram | Time spent in view for a request | | fm_rails_http_requests | controller,action | counter | A counter of HTTP requests made | | fm_rails_proxy_api_duration | method | histogram | Time spent waiting for Proxy (ms) | | fm_rails_proxy_api_response_code | code | counter | Number of Proxy API responses per HTTP code | | fm_rails_ruby_gc_allocated_objects | controller,action | counter | Ruby GC statistics per request (total_allocated_objects) | | fm_rails_ruby_gc_count | controller,action | counter | Ruby GC statistics per request (count) | | fm_rails_ruby_gc_freed_objects | controller,action | counter | Ruby GC statistics per request (total_freed_objects) | | fm_rails_ruby_gc_major_count | controller,action | counter | Ruby GC statistics per request (major_gc_count) | | fm_rails_ruby_gc_minor_count | controller,action | counter | Ruby GC statistics per request (minor_gc_count) | | fm_rails_successful_ui_logins | | counter | Number of successful logins in total | A short demo how to set thigs up: https://youtu.be/i5iCOLEByZk TODO: * [x] Installer * [x] Documentation * [x] Packages for new dependencies

I recorded short demo how to set it up with Prometheus via statsd:

···

On Mon, Nov 20, 2017 at 3:46 PM, Lukas Zapletal <lzap@redhat.com> wrote:

Hey,

on the last demo I presented my proposal for telemetry (it is actually
a separate video). I am looking for non-intrusive approach with broad
integration possibilities:

https://www.youtube.com/watch?v=gCLSI9-4QpE

This was also showed on our demo last week (the same content):
https://www.youtube.com/watch?v=QHzNIFjMpTM

I am starting this thread to gather feedback before I open a PR with
this. Currently the code is mostly in Rails initializer and looks like
this:

# get telemetry singleton instance and setup it
telemetry = Foreman::Telemetry.instance.setup(... some options ...)

# register measurements
telemetry.add_counter(:http_requests, 'A counter of HTTP requests
made', [:controller, :action])
telemetry.add_histogram(:http_request_total_duration, 'Total
duration', [:controller, :action])
telemetry.add_counter(:activerecord_instances, 'Number of instances of
AR models', [:class])

# send measurements from Rails instrumentation or from code base
telemetry.increment_counter(:http_requests, 1, :controller =>
controller, :action => action, :status => status)
telemetry.observe_histogram(:http_request_total_duration, duration,
:controller => controller, :action => action)

The proposed API is a single class (a singleton actually) with three
registering methods and three measure methods. I don't think such a
simple class needs proper separation of concerns, but we can talk
about this in the PR. The registration part could be turned into some
kind of DSL, currently it takes metric name, description and list of
keys which will be part of an instance for those frameworks which do
not support arbitrary amount of key-value pairs.

If there are no objections, I will add settings and better error
handling and file the PR.

--
Later,
Lukas @lzap Zapletal

--
Later,
Lukas @lzap Zapletal

Konstantin_Orekhov · February 23, 2018, 7:31pm

Hi, Lukas! One ask from user perspective - based on my experience of graphite plugin (sending its metrics to locally-running GitHub - prometheus/graphite_exporter: Server that accepts metrics via the Graphite protocol and exports them as Prometheus metrics for subsequent scrapping by Prometheus - very similar to your proposal, just Prometheus is running on a separate server in my setup so it can scrape all of my Foreman instances and then visualized by Grafana). While collecting all Ruby or Foreman controller metrics is great, it is really hard to translate into easy-understadable dashboards - for example, say, if DB connection is spiking up, it results in spiking for some index-generation, but it is not clear what Foreman functional areas are really affected by this. It would be great if there were more metrics exposed per major Foreman function, similar to current dashboards in UI - # puppet runs, hosts built or discovered, tasks, etc.

With these metrics available, it would be much easier to make sense of underlying Ruby metrics, IMHO. One can see rolled-up graphs and, in case, of abnormalities correlate that to ruby calls, db latency, etc.

Another thing I noticed with current graphite plugin is that Ruby metrics are generated only when a particular function is executed resulting in “torn” or spotty graphs:

05%20AM

Not sure what could done about that though. Hoping you may have some ideas on how to make this better looking.

Thanks!

lzap · February 26, 2018, 7:46am

Hey, thanks for trying that out. The initial patch only added the framework and few metrics, I want to carry on and add more of these. I am gonna start thread about what things you would like to see to be measured here on the list, looks like you are faster

So first of all, the most important metric for now is Rails controller duration, this is exported per individual controller and action which gives you great detail of what exactly is slow and you can easily tell which code is causing it. I understand that from higher-lever perspective this can be less useful. There are two options - first you create a query where you aggreggate numbers from various sources (controllers/actions) into one number - let’s say fact import, and present that as one graph. You can do this with all good monitoring applications, including Prometheus or Graphite. We can share dashboards here or in git repo so work don’t get lost. Second option, which I don’t prefer, is to send extra aggregated data per logical domain, that’s basically copying of data. For not I suggest to aggregate all durations and watch for spikes there, the same for object allocations.

The graph you see looks bad, do you use statsd exporter or do you directly connect to Prometheus? Because the latter won’t work with multi-process server (Passenger for example). How do you see the data when you try to scrape them with Prometheus directly? It works for me.

Konstantin_Orekhov · February 26, 2018, 6:55pm

Sorry, I wasn’t clear enough - I did not try your PR yet, just using a graphite plugin with my 1.14 instances. So, exporting directly to Prometheus is simply impossible and I have to use statsd_exporter to accept metrics coming from local Foreman instance and then scrape that exporter from remote Prometheus server (to allow for a central place where dashboards are generated or collected data used otherwise).

I agree with you that combining Rails to aggregate events into a single graph would certainly work. The only issue is that creating those would require a very deep understanding of all underlying pieces at play and that is something that regular users would not necessarily exposed to. Sharing such dashboards would be absolutely great.

Also, are there any plans to add telemetry to Foreman SmartProxy? Would love to have that.

As for the broken graph - the point I was trying to make there is that if a particular controller function is not constantly used, the data provided to statsd is only when that particular function is called thus resulting in broken graphs as there’s simply no data for Grafana to plot against. The other more frequently used functions, graphed out perfectly fine, so I don’t believe this is statsd issue. For example:

44%20AM

This is really smaller of the issue, I guess. Just wanted to point that out.
Thanks!

Konstantin_Orekhov · February 26, 2018, 11:19pm

Sorry for a typo - I meant “graphite” and “graphite_exporter” instead of “statsd” and “stasd_exporter” in my above post.

lzap · February 27, 2018, 10:06am

I would not say not possible, there are some experimental patches for ruby prometheus client to get that working if you want to try it. Hotfix the gem and you can collect the data without statsd if you want to. Hopefully we will get that some day.

Aggregating data is monitoring stack job, not monitored application. I don’t think this needs “deep understanding” of anything, I scheduled a deep dive on this topic where I will show what you can do with this and share important Prometheus queries you want to use. Join and discuss, I would love to hear direct feedback: Foreman :: Foreman Events and https://www.youtube.com/watch?v=QoJ-r8YfWEI

Yes, this is my ultimate goal. But documentation and demo first, more metrics and then this is on my list.

This is nothing wrong with statsd (which is just a protocol), but setting of your aggregator. Each statsd aggregator can be configured to “remember” last values forever, or for some time. AFAIK statsd_exporter does remember forever and there is no configuration option for that, not sure why you see those gaps. Anyway, these controllers are not interesting (if you visit them once a day), if you need to see a graph, change the rendering to continuous type or similar. Or you can increase window from which you calculate means from 5 minutes to hours…

Konstantin_Orekhov · February 27, 2018, 7:36pm

I hope you’ll be recording that session - the time of the day may prevent from joining live, but I’d love to see that demo.

lzap · February 28, 2018, 3:33pm

Yes, we are recording that.

lzap · March 13, 2018, 5:45pm

The telemetry deep dive is not happening tomorrow. I’d love to say it’s calendar conflict or another more important event, the truth is I messed up planning completely and I am not working tomorrow. We will schedule new one, sorry about that!