Question more fore the library authors, I don’t think so. However we use Apache httpd in front of passenger/puma depending on the version, you can use any Apache module… e.g. authentication or IP filtering.
Hey, I can confirm that user.
# while true; do curl -w ‘%{time_total}\n’ https://localhost/metrics -o /dev/null -k -s --noproxy localhost; done
26,244
27,270
27,133
27,407
25,433
26,221
My prometheus has scrape interval set at 10 seconds and timeout at 10 as well.
I’ve fixed this originally by increasing java heap memory and restarting services, but that worked only for about a day and then the issue was back. I thought this must have manifested to a lot of users and was expecting a fix in v2.1
Excellent, since version 2.1.1 this issue has been resolved with prometheus timing out on metrics scrape. I can now monitor it without issues. I’ve made simple grafana monitoring response times for http requests and facts processed. And from node exported monitoring disk, CPU, RAM and each Foreman service uptime. I’m writing alerts now. Will publish it when done if someone finds it helpful.
For the records, it looks like Prometheus Ruby client library uses PIDs for temporary filenames. But Ruby web servers do restart worker processes quite often, this can leave many files behind after short period of time (days, weeks) which makes /metrics endpoint slower and slower.
Thanks, @lzap
I’ve try port 5000, and status on prometheus Down (BAD Request). I think I need to deep dive a little bit, or causing by my network rules. I will back here when there is a result.
Hey,
I’m sorry for the delay. I’ve been really busy. I have planned publishing this in repo on github with grafana, prometheus config and alertmanager config. I’ve also decided to write a blogbost for the community how I integrated Foreman/Katello since my infra is managed by Chef, not Puppet. I’ve caught time now to publish it on grafana.com, but I will publish the rest, if I could just get some time.
Warning: It looks like both Passenger and Puma, our web app servers, do recycle worker processes quite often. This leads to many temporary files created in relatively short period of time (hours on some deployments) which unfortunately makes /metrics endpoint slower and slower to the point it kills the whole deployment. Only restart of the app helps which cleans the temporary directory completely. Ruby client library maintainers are aware of the problem and they said it will be challenging to implement some kind of “squash” mechanism.
I recommend to use statsd exporter instead of native Prometheus and use stasts_exporter Prometheus bridge to collect the data. Or restart the app regularly to clean out temporary data.
Thank you for your reply. I am checking the statsd configuration and I am facing issues detecting some metrics. Can you guide me please to a link that might help me doing the proper configuration? Kindly advise also about the dashboard, I will be able to use the same dashboard used for prometheus metrics? I might need to change the variable name, but I need to make sure that I will be able to use the Grafana dashboard. Thank you.
The rest is different - you need to run statsd_exporter instead of mmvstatsd and from there scrape the data into Prometheus. There is also a rake task you can use to generate statsd_exporter mapping automatically for you:
foreman-rake telemetry:prometheus_statsd
Get back to me if you got this working or not. If you do, write a short tutorial and share it please. We need to document this at some point because not everybody use PCP.