Monitoring Foreman with Prometheus

matemikulic · December 16, 2020, 4:06pm

Hey guys, If anyone using scenario Katello should find this helpful…
I’ve written one-liner which converts API output in json to prometheus metrics, using jq. You can then serve this file and have Prometheus scrape it, serving this info on grafana as part of per host information.

echo "# HELP reboot_required_status has values 0,1,2: 0-no process require restarting, 1-process requires restart, 2-reboot required"; echo "# TYPE reboot_required_status gauge"; curl -XGET -u $username:$password -s https://foreman.example.com/api/hosts?per_page=1000 | jq -r '.results[] | "reboot_required_status{name=\"\(.name)\"} \(.traces_status)"'

Output will look like this:

# HELP reboot_required_status has values 0,1,2: 0-no process require restarting, 1-process requires restart, 2-reboot required
# TYPE reboot_required_status gauge
reboot_required_status{name="server-1.domain.local"} 2
reboot_required_status{name="server-2.domain.local"} 0
reboot_required_status{name="server-3.domain.local"} 2
reboot_required_status{name="server-4.domain.local"} 0
reboot_required_status{name="server-5.domain.local"} 2
reboot_required_status{name="server-6.domain.local"} 2

Odilhao · October 19, 2021, 8:19pm

I’m playing around with Katello 4.2, using Openshift 4.8(crc) to scrape the metrics using the ServiceMonitor to get the data outside of the Openshift Cluster.

The ServiceMonitor is one extension of the Prometheus API, that is created in the a Kubenertes/Openshift cluster when Prometheus is deployed using it’s own Operator.

My main goal was to use the APIs that we already have in Openshift, no one want to manage one stand-alone cluster of Prometheus.

If you have a local Openshift cluster to test, here are the steps that I followed:

Prerequisites:
Both Katello/Foreman and Openshift needs to see each other in the network, at least the /metrics endpoint.
One user with cluster-admin permission in Openshift or with the right granular permission to create one ServiceMonitor Object in one namespace.
I’m assuming that the Katello/Foreman node is already running the telemetry plugin.

Enable[1] the monitoring for User Defined Projects in Openshift:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true

oc create -f cluster-monitoring-user-config.yaml

This will create a new namespace, that will monitor the API for ServiceMonitor objects

oc get pods -n openshift-user-workload-monitoring

Create the new namespace and the Network Objects that will map the Katello/Foreman IP to Openshift.

oc new-project foreman-monitor

To map the external IP to this cluster we will use a Service with ClusterIP that will be pointing to one Endpoint object that will have the Foreman/Katello IP.

kind: Service
apiVersion: v1
metadata:
 name: foreman-prometheus
 labels:
   external-service-monitor: "true"
spec:
 type: ClusterIP
 ports:
 - name: web
   port: 80
   targetPort: 80

oc create -f svc.yaml

Note that this service don’t have any Label defined as source.

For this service to be valid, we need to create one Endpoint that I’ll point to Foreman/katello IP.

kind: Endpoints
apiVersion: v1
metadata:
 name: foreman-prometheus
subsets:
 - addresses:
     - ip: 192.168.122.119
   ports: 
    - name: web
      port: 80

Note that 192.168.122.119 is the IP from my Katello installation, change to match yours.

oc create -f endpoint.yaml

We can validate if the endpoint is working with a service description:

oc describe svc
Name:              foreman-prometheus
Namespace:         foreman-monitor
Labels:            external-service-monitor=true
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.217.4.200
IPs:               10.217.4.200
Port:              web  80/TCP
TargetPort:        80/TCP
Endpoints:         192.168.122.119:80 <<<<<<<<<<<<<< The Endpoint is working
Session Affinity:  None
Events:            <none>

The only thing left is to create the serviceMonitor object inside of the foreman-monitor namespace:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: foreman-prometheus
  namespace: foreman-monitor
spec:
  endpoints:
  - port: web 
    interval: 30s
    scheme: http
    path: /metrics
  selector:
    matchLabels:
      external-service-monitor: "true"

oc create -f servicemonitor.yaml

This serviceMonitor Object will look for the one object with the label external-service-monitor as true, and will scrape the /metrics.

After a couple minutes, the httpd log will start to show entry’s like this one:

"GET /metrics HTTP/1.1" 200 3700 "-" "Prometheus/2.26.1"

To view the metrics we can to Monitoring > Metrics > add the following metric fm_rails_http_requests and press Run Queries.

I’m yet to reach the Grafana side of Prometheus inside of Openshift.

1 - Enabling monitoring for user-defined projects | Monitoring | OpenShift Container Platform 4.8

lzap · October 25, 2021, 9:51am

Watch out, the Prometheus endpoint in Foreman suffers from a serious bug in Ruby prometheus library. As more and more worker processes gets recycled, more and more temporary files are created and this is leading to /metrics endpoint being slower and slower up to minutes or even hours.

Use statsd setting and statsd_exporter from Prometheus to workaround the issue.

ikonia · September 18, 2022, 1:17pm

is there any plan to revisit this, or anything being tracked to watch and update this bug ? getting the foreman endpoint to be a dependable / available endpoint for prometheus would be a big win

lzap · November 7, 2022, 7:15am

Just use statsd and everything is stable, the Ruby prometheus client library hasn’t been fixed. It is just too much to handle for temporary files.