I’m currently working on improvements on 500 page. The main problem is the visibility of stacktrace. For several users of Foreman this can be a security issue. So we’d start a very nice discussion at Github how to fix it properly. The best solution is the show only error code, request id and command that invokes rake task. This rake task just invokes grep with request id on the log file for now. This solution opens doors for sort of “Report of error”. When we have rake task to generate this kind of report, we can attach much more debug info like version of libraries, computer resources… But what add and what to not add?
We use a plugin that pushes all stack traces to sentry. This plugin also overrides the 500 page to display the error code. This is super useful to just tell users: “If you open a support request, please tell us the request id”.
It’d be great to have an API to display a custom error code both on the 500 error page and the API error pages (see app/views folder in the plugin).
Foreman already ships with foreman-debug script which collects all relevant data, including production.log and system data like OS, memory, SELinux, audit etc. It also filters out passwords, tokens etc - the problem of collecting data seems to be more difficult than one would thought.
Recently @pmoravec rewrote our dirty shell script into sosreport plugin. SOS is the tool of choice Red Hat Support uses to troubleshoot customers and since it’s also present on Debian systems, we have decided to go full throttle with it. It has much more capabilities than our shell script. We still ship the script for backward compatiblity and because sosreport in Debian stable is an old version. In the future, we will switch completely.
Fedora also has a nice tool called ABRT that automatically collects crashes from programs and sends them over to Fedora Servers. We have a plugin called Foreman ABRT which can serve as the server to collect errors from managed hosts, unfortunately it has been deprecated.
All in all, I’d simply show the error on the page, request ID and that’s about it. Don’t reinvent the wheel please, your time is precious.
This is I believe not reinventing the wheel. The concern was, we’re losing an easy way to get the backtrace for people who want to report an issue. That is usually the most useful information. sos-report/foreman-debug are nice tools to get a lot of data which is needed in more complicated cases. The motivation here was to find out, if there’s some other critical situation that we want to know about in every report. We tend to ask, what plugins users have installed, what versions. So that’s all, we don’t want to gather information here, that could be useful.
I did not get it correctly then. Well, if we are asking a user/customer to call a rake task, how is that different from asking to call foreman-debug/sos? I mean, Red Hat Support will ask for sosreport in all cases. This feels like breaking something into two phases for no big added value.
If we want to ever build a super fast way of extracting basic information about the system, let’s put it to the Administer - About page and show it only to super administrators. We already have a list of plugins there, I’d probably add more info, made it copy-paste friendly (perhaps via button “Generate Status Report”). That would make a difference - a user would not need to leave the UI to report a bug.
We have foreman-debug script available for a long time, however most of the reported issue don’t contain it’s output. We’re trying to have something, that is easy to use, easily available when error page appears, gives us enough debugging information while we hide stacktrace on 500 page. We could add a button later that runs this rake task and gathers info in the UI, if that helps. But for purpose of hiding the stacktrace from error page and reporting an issue, this is already an improvement. The task output contains installed plugins list and versions.
I am fine with that, I just think it should be sosreport doing the job. You can literally disable all the sos plugins so it exits very quickly. We could have a shell script called “foreman-debug-quick” or just “foreman-summary” gathering all the required data with just few commands. Ideally it would call sosreport with only one sos plugin enabled. Both foreman-debug and sosreport already contain the code which reports list of installed Foreman plugins.
Rake tasks are very slow to launch if you have Katello deployment and few more plugins. But if you insist, I’d probably shrug and move on. Remember to check this on a production install, SELinux will block Foreman process from reading log files (it is only allowed to list and write to them) and I am not sure if rake tasks are subject of SELinux enforcing.
I think the main difference here from foreman-debug/sos report is that this will only give the backtrace and full log of one specific request.
My experience is that in at least 95% of cases, this is enough information to make a good bug report that can be resolved, without having to dig through hundreds of files and possibly millions of lines of logs to find the specific error.
The goal here is - a user hits an error page, give them a quick command they can use to open a useful bug report on redmine or post a useful question to the forum, without having to share the full sos report (which takes a long while to generate) right away if it isn’t really needed. Currently, most of this information is shown to the user on the error page, and some security policies apparently consider this a problem. The added benefit with this approach is that we will get the full request and not just the stacktrace, as well as foreman and plugin versions which are often missed during initial reports.
This is a good point, we should make sure this does not break on selinux.
I get the idea, I just think this should not be a rake task. Having this directly in the UI would be the best, we could have even a link on the error page that would only work (or render) if there were permissions. However I am pretty sure Rails application is prevented from reading logs by our SELinux policy.
Let me step back a bit - we want to improve user experience with reporting errors. What if we changed our error handling code to actually create a short report stored as Report in our database? It could contain all the details and be in the copy-paste friendly form. Also administrators could track errors more easily and return to them after some time. And we would workaround the SELinux log file access problem - no grep would be required.
I simply think that we are not solving the problem. You are right that users don’t bother running foreman-debug/sos when they encounter an error, what makes you think they would bother running a rake task in a similar manner? That’s why I think we should stop and rethink the whole thing. Maybe it should be completely different user experience.
As I said, I understand the idea of gathering small amount of info. I know sosreport is overhelming and also too big to attach.