Urgent help - foreman stops responding

Hello,
We manage about 5k hosts. Today foreman stopped responding, it dies when puppet uploads reports. No any errors in the logs foreman nor apache/apache:
Here’s passenger output:

Version: 5.3.2
Date   : 2018-07-10 22:50:44 -0700

--------- Apache processes ---------
PID   PPID  VMSize    Private  Name
------------------------------------
993   1     317.7 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
1331  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1335  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1336  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1337  993   319.8 MB  0.6 MB   /usr/sbin/httpd -DFOREGROUND
1338  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1339  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1340  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1341  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1582  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1586  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1587  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1588  993   319.8 MB  0.6 MB   /usr/sbin/httpd -DFOREGROUND
1855  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1889  993   319.8 MB  0.6 MB   /usr/sbin/httpd -DFOREGROUND
1890  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
1973  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2007  993   319.8 MB  0.6 MB   /usr/sbin/httpd -DFOREGROUND
2015  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2104  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2105  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2123  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2186  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2220  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2221  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2296  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2316  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2317  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2318  993   319.8 MB  0.6 MB   /usr/sbin/httpd -DFOREGROUND
2319  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2320  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2321  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2322  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2329  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2332  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2333  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2334  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2343  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2352  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2355  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2356  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2365  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2366  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2367  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2379  993   319.8 MB  0.5 MB   /usr/sbin/httpd -DFOREGROUND
2388  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2389  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2390  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2402  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2410  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2419  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2421  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2422  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2434  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2444  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2445  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2446  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2455  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2494  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2495  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2496  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2507  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2521  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2522  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2523  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2540  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2557  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2558  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2559  993   319.8 MB  0.4 MB   /usr/sbin/httpd -DFOREGROUND
2585  993   319.8 MB  0.3 MB   /usr/sbin/httpd -DFOREGROUND
2607  993   319.8 MB  0.3 MB   /usr/sbin/httpd -DFOREGROUND
2608  993   319.8 MB  0.3 MB   /usr/sbin/httpd -DFOREGROUND
2609  993   319.8 MB  0.3 MB   /usr/sbin/httpd -DFOREGROUND
2620  993   319.8 MB  0.3 MB   /usr/sbin/httpd -DFOREGROUND
### Processes: 74
### Total private dirty RSS: 33.15 MB

-------- Nginx processes --------

### Processes: 0
### Total private dirty RSS: 0.00 MB

----- Passenger processes ------

PID   VMSize     Private   Name
--------------------------------
1288  354.9 MB   2.2 MB    Passenger watchdog
1299  3157.7 MB  10.2 MB   Passenger core
1491  1852.4 MB  105.8 MB  Passenger AppPreloader: /usr/share/foreman
1604  1401.7 MB  416.7 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1632  1401.7 MB  395.5 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1662  1401.7 MB  395.5 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1692  1401.7 MB  395.2 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1722  1196.8 MB  171.7 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1757  1401.7 MB  394.4 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1792  1401.7 MB  394.2 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1829  1401.7 MB  394.2 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1859  1401.7 MB  389.0 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1916  1196.8 MB  170.0 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1947  1401.8 MB  391.2 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
1981  1466.8 MB  383.8 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2016  1339.0 MB  169.3 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2045  1405.0 MB  169.3 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2076  1661.9 MB  387.2 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2124  1726.9 MB  385.4 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2160  1791.9 MB  383.4 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2192  1669.1 MB  169.2 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2237  1921.9 MB  386.6 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
2268  1987.0 MB  367.3 MB  Passenger AppPreloader: /usr/share/foreman (forking...)
### Processes: 23
### Total private dirty RSS: 6827.22 MB


# passenger-status
Version : 5.3.2
Date    : 2018-07-10 22:50:49 -0700
Instance: YnolO8od (Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips Phusion_Passenger/5.3.2)

----------- General information -----------
Max pool size : 20
App groups : 1
Processes : 20
Requests in top-level queue : 0

----------- Application groups -----------

/usr/share/foreman (production):
  App root: /usr/share/foreman
  Requests in queue: 48
  * PID: 1604    Sessions: 1       Processed: 4       Uptime: 8m 44s
    CPU: 0%      Memory  : 416M    Last used: 8m 40s ago
  * PID: 1632    Sessions: 1       Processed: 1       Uptime: 8m 44s
    CPU: 0%      Memory  : 395M    Last used: 8m 40s ago
  * PID: 1662    Sessions: 1       Processed: 3       Uptime: 8m 44s
    CPU: 0%      Memory  : 395M    Last used: 8m 39s ago
  * PID: 1692    Sessions: 1       Processed: 1       Uptime: 8m 43s
    CPU: 0%      Memory  : 395M    Last used: 8m 37s ago
  * PID: 1722    Sessions: 1       Processed: 0       Uptime: 8m 43s
    CPU: 0%      Memory  : 171M    Last used: 8m 36s ago
  * PID: 1757    Sessions: 1       Processed: 1       Uptime: 8m 35s
    CPU: 0%      Memory  : 394M    Last used: 8m 32s ago
  * PID: 1792    Sessions: 1       Processed: 1       Uptime: 8m 34s
    CPU: 0%      Memory  : 394M    Last used: 8m 30s ago
  * PID: 1829    Sessions: 1       Processed: 1       Uptime: 8m 28s
    CPU: 0%      Memory  : 394M    Last used: 8m 25s ago
  * PID: 1859    Sessions: 1       Processed: 21      Uptime: 8m 28s
    CPU: 0%      Memory  : 388M    Last used: 7m 40s ago
  * PID: 1916    Sessions: 1       Processed: 0       Uptime: 7m 39s
    CPU: 0%      Memory  : 170M    Last used: 7m 39s ago
  * PID: 1947    Sessions: 1       Processed: 1       Uptime: 7m 38s
    CPU: 0%      Memory  : 391M    Last used: 7m 35s ago
  * PID: 1981    Sessions: 1       Processed: 2       Uptime: 7m 36s
    CPU: 0%      Memory  : 383M    Last used: 7m 32s ago
  * PID: 2016    Sessions: 1       Processed: 0       Uptime: 7m 34s
    CPU: 0%      Memory  : 169M    Last used: 7m 34s ago
  * PID: 2045    Sessions: 1       Processed: 0       Uptime: 7m 33s
    CPU: 0%      Memory  : 169M    Last used: 7m 33s ago
  * PID: 2076    Sessions: 1       Processed: 3       Uptime: 7m 33s
    CPU: 0%      Memory  : 387M    Last used: 7m 27s ago
  * PID: 2124    Sessions: 1       Processed: 2       Uptime: 7m 25s
    CPU: 0%      Memory  : 385M    Last used: 7m 19s ago
  * PID: 2160    Sessions: 1       Processed: 6       Uptime: 7m 23s
    CPU: 0%      Memory  : 383M    Last used: 7m 10s ago
  * PID: 2192    Sessions: 1       Processed: 0       Uptime: 7m 22s
    CPU: 0%      Memory  : 169M    Last used: 7m 22s ago
  * PID: 2237    Sessions: 1       Processed: 12      Uptime: 7m 9s
    CPU: 0%      Memory  : 386M    Last used: 6m 39s ago
  * PID: 2268    Sessions: 1       Processed: 5       Uptime: 7m 8s
    CPU: 0%      Memory  : 367M    Last used: 6m 29s ago

Foreman 1.16.2

Thanks you

what about logs from /var/log/foreman?

It looks like passenger is trying to spawn two dozens of processes, maybe your server was overloaded by so many Rails booting processes? They appear all started at the same time.

Define dies. There must be backtrace or some kind of info. Did OOM killed it (system journal)? Was it httpd (errors_log)? Any error in production.log?

Should we limit number of processes passenger can spawn? I think it’s better to reject a couple of calls than to render the whole server unresponsive.

It could be a workaround at least.

Server is totally fine. 6gb of free ram, cpu idle 80%.
I think I might need to do some Apache configuration tweaking.

No any errors in any logs.

I did some tuning for foreman masters (passenger and apache)

apache::mod::passenger::passenger_max_pool_size: 30
apache::mod::passenger::passenger_min_instances: 30
apache::mod::prefork::maxclients: 512
apache::mod::prefork::startservers: 25
apache::mod::prefork::minspareservers: 25
apache::mod::prefork::maxspareservers: 50
apache::mod::prefork::serverlimit: 1024
apache::mod::prefork::maxclients: 1024
apache::mod::prefork::maxrequestsperchild: 5000

Both VMs have 16gb of ram and 8 cores.

Seem to be working fine now, will do some load testing today.

Slowness was related to /etc/puppetlabs/puppet/node.rb --push-facts-parallel from puppet masters every 10 minutes.

I adjusted to export facts for the last 6 minutes every 5 minutes. This should spread the load.

  • Going to enable round-robin reports/ENC servers.

Thanks everyone

2 Likes

And we do this, 12 processes maximum, 6 per application. So by default it is 6 processes for Foreman. You can override this via installer.

We desperately need a tuning guide for Foreman, would you like to at least send a blogpost about how you tune this? :smile:

1 Like

Well,
My tuning didn’t help me :smiley:
Passenger just hangs.
No luck so far.
OS itself is ok, no any issues.

Looks like I’m facing this issue:

Was it fixed in 1.17?

Checked PR, looks like I have 1.16.2 where this problem was fixed :confused:

Going to disable all reports from puppets.

Not sure what to do with performance.

And it’s related to reports. External node classifier works just fine

More interesting.

Puppet shows timeout errors from foreman.

I disabled primary puppet4 node and export reports from puppet5 - it has ~150 clients. :confused:

2018-07-12 10:42:21,357 ERROR [qtp2105375706-449] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:43:32,747 ERROR [qtp2105375706-450] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:43:36,517 ERROR [qtp2105375706-451] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:43:43,862 ERROR [qtp2105375706-442] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:44:36,855 ERROR [qtp2105375706-441] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:44:39,179 ERROR [qtp2105375706-460] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:44:44,186 ERROR [qtp2105375706-457] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:45:37,256 ERROR [qtp2105375706-447] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:45:45,045 ERROR [qtp2105375706-441] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:46:17,727 ERROR [qtp2105375706-464] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:47:03,742 ERROR [qtp2105375706-460] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:47:26,032 ERROR [qtp2105375706-462] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:47:59,374 ERROR [qtp2105375706-465] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error
2018-07-12 10:48:18,980 ERROR [qtp2105375706-463] [puppetserver] Puppet Report processor failed: Could not send report to Foreman at https://foreman-ha.example.com/api/config_reports: Timeout::Error

Lol…Looks like we had an issues with our DB (Mysql) The cluster had 5k default opened files.
Looks like bumping solved the problem

2 Likes

Are you sure this is the root cause of the problem? :wink: Hopefully you are good now.

Well, we’ve been running foreman for 4 months now and didn’t have a single issue.
So far everything is fine.
Keep my fingers crossed.