Slow performance using GUI, clients timing out, etc. We have over 300 hosts communicating with this server, since the upgrade approx 75% of them time out when retrieving the manifest.
Other relevant data:
I can provide the debug data archive at request - I’m not aware of what information is in there, so I haven’t included it by default.
Its very hard to understand from your post the issue. it would be really useful if you could provide some specific UI / API actions that are slow, and adding the relevant logs to when clients timeout etc.
@lzap is there any guide in how to troubleshooting performance using the new telemetry framework (or any other suggestions)?
Hey, telemetry was introduced in 1.17 but the basic investigation is to find out slow endpoints. You can do this with grep from production.log easily. Sure, use foreman-debug -u to upload logs that would help. Or you can simply tell us which pages are slow in particular? I don’t believe that everything is slow.
Also do rough analysis of the host - how much memory consumed, swapping, I/O load or CPU load?
Isn’t 1.16 the version when we introduced the power icon in host list? This can generate quite some I/O wait, it’s configurable tho. Just guessing.
The way we are expiring reports is terrible and very transaction-unfriendly. I started an attempt to fix this in
but I don’t have time to finish this. The idea is to break the deletion into small transactions (batches).
BUT I don’t think that this is the root cause of Ruby processes burning CPU. The reports expiration rather creates transaction errors than loops with some hard work.