Monitoring Foreman with Prometheus

Excellent, since version 2.1.1 this issue has been resolved with prometheus timing out on metrics scrape. I can now monitor it without issues. I’ve made simple grafana monitoring response times for http requests and facts processed. And from node exported monitoring disk, CPU, RAM and each Foreman service uptime. I’m writing alerts now. Will publish it when done if someone finds it helpful.

3 Likes

@matemikulic We would find that very helpful, thanks!

2 Likes

For the records, it looks like Prometheus Ruby client library uses PIDs for temporary filenames. But Ruby web servers do restart worker processes quite often, this can leave many files behind after short period of time (days, weeks) which makes /metrics endpoint slower and slower.

Until we fix this, I do recommend using statsd protocol, there is https://github.com/prometheus/statsd_exporter that can be utilized to load data into Prometheus if needed.

Here is an attempt to fix this: https://github.com/theforeman/foreman/pull/8011

2 Likes

Hai @lzap

It may a little stupid qustion, but i need to ask, what port we should put on prometheus scrapt setting

By default its 5000 but I am not sure if Katello deployment has something on it. Just pick one.

For the record, this will fix the temporary files problem. We are almost there. If you need it right now just backport the patch.

Thanks, @lzap
I’ve try port 5000, and status on prometheus Down (BAD Request). I think I need to deep dive a little bit, or causing by my network rules. I will back here when there is a result.

Thanks

Hello, is it possible to share with us the json file of the Grafana dashboard that you made? it is based on prometheus datasource right? Thank you

Hey,
I’m sorry for the delay. I’ve been really busy. I have planned publishing this in repo on github with grafana, prometheus config and alertmanager config. I’ve also decided to write a blogbost for the community how I integrated Foreman/Katello since my infra is managed by Chef, not Puppet. I’ve caught time now to publish it on grafana.com, but I will publish the rest, if I could just get some time.

Here is the grafana with node exporter and prometheus config instructions for now. https://grafana.com/grafana/dashboards/13469

3 Likes

Thank you

Warning: It looks like both Passenger and Puma, our web app servers, do recycle worker processes quite often. This leads to many temporary files created in relatively short period of time (hours on some deployments) which unfortunately makes /metrics endpoint slower and slower to the point it kills the whole deployment. Only restart of the app helps which cleans the temporary directory completely. Ruby client library maintainers are aware of the problem and they said it will be challenging to implement some kind of “squash” mechanism.

I recommend to use statsd exporter instead of native Prometheus and use stasts_exporter Prometheus bridge to collect the data. Or restart the app regularly to clean out temporary data.

1 Like

Thank you for your reply. I am checking the statsd configuration and I am facing issues detecting some metrics. Can you guide me please to a link that might help me doing the proper configuration? Kindly advise also about the dashboard, I will be able to use the same dashboard used for prometheus metrics? I might need to change the variable name, but I need to make sure that I will be able to use the Grafana dashboard. Thank you.

In Red Hat product we don’t use Prometheus, but this should give you some overview on how to configure statsd: https://access.redhat.com/documentation/en-us/red_hat_satellite/6.8/html/monitoring_red_hat_satellite/index

The rest is different - you need to run statsd_exporter instead of mmvstatsd and from there scrape the data into Prometheus. There is also a rake task you can use to generate statsd_exporter mapping automatically for you:

foreman-rake telemetry:prometheus_statsd

Get back to me if you got this working or not. If you do, write a short tutorial and share it please. We need to document this at some point because not everybody use PCP. :slight_smile:

2 Likes

Here is complete guide: Monitoring Foreman with Prometheus via statsd

Let me know if it worked for you.

1 Like

Thank you for your help and advise. It’s working fine now and I am able to monitor foreman from Prometheus successfully. Thank you

Hey guys, If anyone using scenario Katello should find this helpful…
I’ve written one-liner which converts API output in json to prometheus metrics, using jq. You can then serve this file and have Prometheus scrape it, serving this info on grafana as part of per host information.

echo "# HELP reboot_required_status has values 0,1,2: 0-no process require restarting, 1-process requires restart, 2-reboot required"; echo "# TYPE reboot_required_status gauge"; curl -XGET -u $username:$password -s https://foreman.example.com/api/hosts?per_page=1000 | jq -r '.results[] | "reboot_required_status{name=\"\(.name)\"} \(.traces_status)"'

Output will look like this:

# HELP reboot_required_status has values 0,1,2: 0-no process require restarting, 1-process requires restart, 2-reboot required
# TYPE reboot_required_status gauge
reboot_required_status{name="server-1.domain.local"} 2
reboot_required_status{name="server-2.domain.local"} 0
reboot_required_status{name="server-3.domain.local"} 2
reboot_required_status{name="server-4.domain.local"} 0
reboot_required_status{name="server-5.domain.local"} 2
reboot_required_status{name="server-6.domain.local"} 2
1 Like

I’m playing around with Katello 4.2, using Openshift 4.8(crc) to scrape the metrics using the ServiceMonitor to get the data outside of the Openshift Cluster.

The ServiceMonitor is one extension of the Prometheus API, that is created in the a Kubenertes/Openshift cluster when Prometheus is deployed using it’s own Operator.

My main goal was to use the APIs that we already have in Openshift, no one want to manage one stand-alone cluster of Prometheus.

If you have a local Openshift cluster to test, here are the steps that I followed:

Prerequisites:
Both Katello/Foreman and Openshift needs to see each other in the network, at least the /metrics endpoint.
One user with cluster-admin permission in Openshift or with the right granular permission to create one ServiceMonitor Object in one namespace.
I’m assuming that the Katello/Foreman node is already running the telemetry plugin.

  1. Enable[1] the monitoring for User Defined Projects in Openshift:
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true

oc create -f cluster-monitoring-user-config.yaml

This will create a new namespace, that will monitor the API for ServiceMonitor objects

oc get pods -n openshift-user-workload-monitoring

  1. Create the new namespace and the Network Objects that will map the Katello/Foreman IP to Openshift.

oc new-project foreman-monitor

To map the external IP to this cluster we will use a Service with ClusterIP that will be pointing to one Endpoint object that will have the Foreman/Katello IP.

kind: Service
apiVersion: v1
metadata:
 name: foreman-prometheus
 labels:
   external-service-monitor: "true"
spec:
 type: ClusterIP
 ports:
 - name: web
   port: 80
   targetPort: 80

oc create -f svc.yaml

Note that this service don’t have any Label defined as source.

For this service to be valid, we need to create one Endpoint that I’ll point to Foreman/katello IP.

kind: Endpoints
apiVersion: v1
metadata:
 name: foreman-prometheus
subsets:
 - addresses:
     - ip: 192.168.122.119
   ports: 
    - name: web
      port: 80

Note that 192.168.122.119 is the IP from my Katello installation, change to match yours.

oc create -f endpoint.yaml

We can validate if the endpoint is working with a service description:

oc describe svc
Name:              foreman-prometheus
Namespace:         foreman-monitor
Labels:            external-service-monitor=true
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.217.4.200
IPs:               10.217.4.200
Port:              web  80/TCP
TargetPort:        80/TCP
Endpoints:         192.168.122.119:80 <<<<<<<<<<<<<< The Endpoint is working
Session Affinity:  None
Events:            <none>
  1. The only thing left is to create the serviceMonitor object inside of the foreman-monitor namespace:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: foreman-prometheus
  namespace: foreman-monitor
spec:
  endpoints:
  - port: web 
    interval: 30s
    scheme: http
    path: /metrics
  selector:
    matchLabels:
      external-service-monitor: "true"

oc create -f servicemonitor.yaml

This serviceMonitor Object will look for the one object with the label external-service-monitor as true, and will scrape the /metrics.

After a couple minutes, the httpd log will start to show entry’s like this one:

"GET /metrics HTTP/1.1" 200 3700 "-" "Prometheus/2.26.1"

To view the metrics we can to Monitoring > Metrics > add the following metric fm_rails_http_requests and press Run Queries.

I’m yet to reach the Grafana side of Prometheus inside of Openshift.


1 - Enabling monitoring for user-defined projects | Monitoring | OpenShift Container Platform 4.8

1 Like

Watch out, the Prometheus endpoint in Foreman suffers from a serious bug in Ruby prometheus library. As more and more worker processes gets recycled, more and more temporary files are created and this is leading to /metrics endpoint being slower and slower up to minutes or even hours.

Use statsd setting and statsd_exporter from Prometheus to workaround the issue.

is there any plan to revisit this, or anything being tracked to watch and update this bug ? getting the foreman endpoint to be a dependable / available endpoint for prometheus would be a big win

Just use statsd and everything is stable, the Ruby prometheus client library hasn’t been fixed. It is just too much to handle for temporary files.