Improvements in monitoring/telemetry for Foreman 1.24

Foreman (the Rails App) can be configured to expose telemetry data via two popular formats: statsd and prometheus. Recent changes in the Foreman core (develop branch, 1.24 version) enabled new “datadog” protocol extensions to be used leading to better integration. The recommended configuration now is:

:telemetry:
  :prefix: 'fm_rails'
  :statsd:
    :enabled: true
    :host: '127.0.0.1:8125'
    :protocol: 'datadog'

Integration with Red Hat-preferred monitoring tool Performance Co-Pilot was greatly improved thanks to Google Summer of Code 2019 project pmdastatsd. An agent for PCP which natively reads statsd protocol including “datadog” extensions and passes them into the PCP.

Historically, I wrote a daemon called mmvstatsd which was required for PCP to support statsd protocol. You can read more about this setup in the Satellite 6 Monitoring Guide. With the new pmdastatsd agent, the daemon is no longer required. The installation is now as simple as:

dnf install pcp pcp-pmda-statsd
systemctl start pmcd
cd /var/lib/pcp/pmdas/statsd
./Install

All telemetry data is now available in the PCP metric namespace:

[lzap@box rpm]$ pminfo | grep statsd
statsd.fm_rails_ruby_gc_count
statsd.fm_rails_ruby_gc_allocated_objects
statsd.fm_rails_activerecord_instances
statsd.fm_rails_http_request_total_duration
statsd.fm_rails_ruby_gc_major_count
statsd.fm_rails_http_request_view_duration
statsd.fm_rails_ruby_gc_minor_count
statsd.fm_rails_ruby_gc_freed_objects
statsd.fm_rails_http_request_db_duration
statsd.fm_rails_http_requests
statsd.pmda.settings.duration_aggregation_type
statsd.pmda.settings.parser_type
statsd.pmda.settings.port
statsd.pmda.settings.debug_output_filename
statsd.pmda.settings.verbose
statsd.pmda.settings.max_unprocessed_packets
statsd.pmda.settings.max_udp_packet_size
statsd.pmda.time_spent_aggregating
statsd.pmda.time_spent_parsing
statsd.pmda.metrics_tracked
statsd.pmda.aggregated
statsd.pmda.dropped
statsd.pmda.parsed
statsd.pmda.received

The new design cleaned much of the PCP namespace thanks to the label mapping via datadog extensions and use of instances and labels in PCP. To get 99th percentile of HTTP request total duration of the hosts controller index action, simply do:

pmval statsd.fm_rails_http_request_total_duration\[/percentile99::action=index::controller=hosts_controller\]

All other tools from the PCP suite can be utilized to work with the data including:

  • inspecting data via CLI utilities
  • creating custom alerts on events
  • writing data into archive files
  • using various TUI tools (atop)
  • charting via GUI (pmchart)
  • charting via Vector or Grafana
  • exporting data to 3rd parties

Special thanks to:

  • Miroslav Foltýn for the implementation of the PMDA
  • Nathan Scott for co-mentoring the agent
  • Google for funding the GSoC project