RFC: Installer Output and Logging Format

ehelms · August 13, 2020, 1:47pm

I have been looking into changing the standard progress output of the installer along with enhancements to the log output. There are a few ways we can go with this as well as some smaller changes to increase the debuggability of the log files.

There are a number of items to unpack and weigh in on here. I would really appreciate if folks gave in-depth thoughts or even just simple replies with which aspects they like to help guide the changes in this area.

Progress Output

When we think about running the installer in it’s standard way (no verbose mode or modification to log levels) there are two ways I think we could approach it. The current method of using a progress bar but breaking out the steps to show progress step by step or a log-style approach.

Option 1: Progress Bar Style

The first option is continuing with the progress bar theme but breaking this out by stage of the installer, showing progress for each stage and elapsed time and then moving on to the next step without erasing the previous one from the screen. This gives more of a sequence feel to the process and could include things like total number of steps and what step the process is currently on.

Added this based on feedback to show a progress style output but using step counts rather than bars:

Option 2: Log Based

The second option is to unify the log file output with the progress output from running the installer to be the same output in both places. This would keep the output the same no matter where a user looked and be affected by verbosity levels where a user wanted to run installation with more information, or run it in noop mode but with more details. This method is not as nice to look at but is straight forward.

Log Formatting Changes

The current installer log output and formatting prints the following information:

date and time
log level
logger name
message

For this proposal, I’d like to modify both what gets printed and the formatting of that data as it appears in the log file or on the screen when using verbose mode.

Log Data

The only proposed change here is to drop the name of the logger (e.g. verbose, main) since that has no contextual value to a user debugging. Instead, replace this with the stage the installer is at to better understand where an error might be coming from. For example, this would denote if the installer was running a boot hook, pre hook, puppet apply or post install hook. Further, actions within a hook could denote themselves as a stage and provide further context. For example, package unlocking could identify itself as a stage, e.g.

2020-08-13 09:14:44 INFO [init] Warming up
2020-08-13 09:14:45 INFO [boot] Executing boot hooks
2020-08-13 09:14:46 INFO [package_unlock] Package versions are being unlocked
2020-08-13 09:14:46 INFO [pre] Executing pre configure hooks

Log Format

This topic area relates to how the data is arranged within the log at a line by line level. This aims to take into account readability, the types of data we want to display and some bit of reading and thinking of formats that may enhance visual parsing.

Current Format:

[INFO  2020-08-13T09:10:51 init] Warming up
[INFO  2020-08-13T09:10:52 boot] Executing boot hooks
[INFO  2020-08-13T09:10:53 package_unlock] Package versions are being unlocked
[INFO  2020-08-13T09:10:53 pre] Executing pre configure hooks
[WARN 2020-08-12T20:45:50 configure]  Stage[main]/Foreman_proxy::Module::Logs/Foreman_proxy::Module[logs]/Foreman_proxy::Feature[Lo
gs]/Datacat_fragment[foreman_proxy::enabled_features::Logs]: Skipping because of failed dependencies
[ERROR 2020-08-13T09:10:54 configure] Puppet apply error

Here I have presented 3 options from my reading to think about and spark conversation. Feel free to either reply and say you like one, to propose a new formatting all together or propose some tweaks to one.

Also, in the below option examples I have chosen to spit the date to be separated by a space rather than the T character that is part of the ISO 8601 format. I did this to show a different possibility from our current format and because I personally find it easier to read when debugging. Please do share regarding this as well.

Option 1

Move the date to the left, and only wrap the stage in brackets to emphasize.

2020-08-13 09:13:38 INFO  [init] Warming up
2020-08-13 09:13:39 INFO  [boot] Executing boot hooks
2020-08-13 09:13:40 INFO  [package_unlock] Package versions are being unlocked
2020-08-13 09:13:40 INFO  [pre] Executing pre configure hooks
2020-08-13 09:13:40 WARN  [configure] Stage[main]/Foreman_proxy::Module::Logs/Foreman_proxy::Module[logs]/Foreman_proxy::Feature[Lo
gs]/Datacat_fragment[foreman_proxy::enabled_features::Logs]: Skipping because of failed dependencies
2020-08-13 09:13:40 ERROR [configure] Puppet apply error

Option 2

Space separators between each data point

2020-08-13 | 09:13:38 | INFO  | init           | Warming up
2020-08-13 | 09:13:39 | INFO  | boot           | Executing boot hooks
2020-08-13 | 09:13:40 | INFO  | package_unlock | Package versions are being unlocked
2020-08-13 | 09:13:40 | INFO  | pre            | Executing pre configure hooks
2020-08-13 | 09:13:40 | WARN  | configure      | Stage[main]/Foreman_proxy::Module::Logs/Foreman_proxy::Module[logs]/Foreman_proxy::Feature[Lo
gs]/Datacat_fragment[foreman_proxy::enabled_features::Logs]: Skipping because of failed dependencies
2020-08-13 | 09:13:40 | ERROR | configure      | Puppet apply error

Option 3

Current format, drop brackets to only wrap stage

INFO  2020-08-13 09:13:38 [init] Warming up
INFO  2020-08-13 09:13:39 [boot] Executing boot hooks
INFO  2020-08-13 09:13:40 [package_unlock] Package versions are being unlocked
INFO  2020-08-13 09:13:40 [pre] Executing pre configure hooks
WARN  2020-08-13 09:13:40 [configure] Stage[main]/Foreman_proxy::Module::Logs/Foreman_proxy::Module[logs]/Foreman_proxy::Feature[Lo
gs]/Datacat_fragment[foreman_proxy::enabled_features::Logs]: Skipping because of failed dependencies
ERROR 2020-08-13 09:13:40 [configure] Puppet apply error

aruzicka · August 13, 2020, 5:45pm

Option 1: Progress Bar Style

Can we even show reliable progress bars for each stage? If so, what are they based on? Puppet knowing the remaining number of “steps” to perform in each stage? I’d very much like to avoid windows style progress bars (xkcd 612), if possible

Even if we could display reliable progress bars, I’d probably still favor the log based output. I think it is not a bad idea to have logs and the live output from the command visually similar. (here I assume that even with progress bars the logs would be log based.)

Completely agree with the log data paragraph.

As far as log format goes, I wouldn’t do anything too crazy and would stick with 1 or 3, possibly in this order.

On a semi-serious note, my concern with 2 is that there would be an insanely long stage name and when you looked at the logs, you would see the timestamp, log level, stage and nothing, unless you have one of those ultra-wide monitors, making pre-processing with scripts almost necessary. Even the package unlocks pad things quite a lot and if you’d grep -v them away, you would waste a lot of screen space.

ekohl · August 13, 2020, 6:05pm

In vanilla Foreman most hooks have trivial runtime. The only exception is the PostgreSQL upgrade on EL7. However, that would be on huge step so a progress bar isn’t all that useful.

The majority of the time is spent in gathering defaults (which does a Puppet run in noop mode) and the application step (which runs Puppet without noop). So I don’t think the additional progress bars don’t have that much value.

ehelms · August 13, 2020, 7:33pm

It’s a good point and as @ekohl also notes most hooks are short, minus the occasional upgrade related hook that generally ends up unpredictable time wise. I do think there is an argument to be made for showing users that something is happening (i.e. progress) to prevent a feeling of “is this thing stuck?” and gauging somewhat where the process is at. I added a screenshot of another “progress” style where we can show number of steps left in a stage and still show runtime.

A suggestion from @Marek_Hulan has been to indicate a number for each stage and how many stages are thus left. Something akin to the left side here:

[1/7] Warming up: (1/1) 00:00:00

[2/7] Executing boot hooks: (5/5) 00:00:00

[3/7] Package versions are being unlocked: (1.0/1) 00:00:01

[4/7] Executing pre hooks: (3/3) 00:00:01

[5/7] Configuring system: (700/700) 00:00:19

[6/7] Executing post install hooks: (5/5) 00:00:19

[7/7] Package versions are being locked: (1.0/1) 00:00:200

* Foreman is running at https://foreman.example.com

* Login credentials -- username: admin, password: changeme

The full log is at /var/log/foreman-installer/foreman.log

Assuming we can calculate that completely at boot time.

Dirk · August 14, 2020, 10:54am

While I would not have complaint with the current solution, I am happy with some improvement and every of the options looks fine.

For the logs I would like to add the requirement that it should be easily parseable for tools like Elastic Stack or Graylog. Because of this I do not like Option 2 very much, as the timestamp is split, the number of whitespaces differs and the pipe as divider is just overhead. It is not unparseable but at least makes it more difficult and the message size is increased.
The others are better, still not sure if I like the Loglevel column being the same size because INFO and WARN add an additional whitespace to have same length as ERROR. If it helps readability I am fine, as it is still parseable.

ehelms · August 27, 2020, 6:55pm

This one is tricky as it leads to shortening (e.g. ERRO) but does not address other log levels such as DEBUG (which shortened would be DEBU).

Is it the message portion of the log file that makes the parsing more difficult? Does detecting the ] prior to the message portion help? Would a more clear delineator before the message portion help?

Dirk · August 28, 2020, 9:18am

It is totally fine with having different length like ERROR, DEBUG and INFO, adding then a different amount of whitespaces makes it a very likely cause of parser error because you have to realize that the format is not only one whitespace as separator but a flexible number of whitespaces.

ehelms · September 11, 2020, 12:22pm

Following up here, I appreciate the feedback so far. I chose to go with Option 1 and have opened some implementation PRs:

Adding known stages to logs - https://github.com/theforeman/kafo/pull/274
Modifying the log format - https://github.com/theforeman/kafo/pull/276

Duncan_Innes · September 14, 2020, 8:38am

Log Format

Option 4?

Log directly to journalctl. With additional tagging for stage.

widhalmt · September 16, 2020, 10:41am

Hi, @Dirk pulled me into the circle with this one.

He started to create Logstash rules for the new format and asked me about feedback. Running it through Logstash should be representative about how well the new format is parseable in general. Based on the short example in Modify log format by ehelms · Pull Request #276 · theforeman/kafo · GitHub I can say, it works quite well with parsing the new logs with logmanagement tools.

Here’s some example code if you want to try it yourself. Filebeat is the agent to collect logs from a host, Logstash is a central processor who parses and forwards the logs. Please note that I just used example paths and placed all files into /tmp. Also the output goes into a file in JSON-format just for debugging. In a real life example you would put it into Elasticsearch and search and query it with Kibana.

Filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /tmp/foreman.log
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after
output.logstash:
  hosts: ["localhost:5044"]

And a matching Logstash pipeline:

input {
  beats {
    port => 5044
  }
}
filter {
  grok {
    match => ["message","%{DATESTAMP:[@metadata][foremandate]} %{LOGLEVEL:[log][level]}%{SPACE}\[%{DATA:[foreman][stage]}\] %{GREEDYDATA:message}"]
    id => "foreman_log"
    tag_on_failure => ["_grokparsefailure","foreman_grok_failed"]
    overwrite => "message"
  }
  date {
    match => ["[@metadata][foremandate]", "yyyy-MM-dd HH:mm:ss"]
    id => "foreman_date"
    tag_on_failure => ["_dateparsefailure","foreman_date_failed"]
  }
}
output {
  file {
    path => "/tmp/foreman-output"
    codec => "json_lines"
  }
}

Dirk and me talked about providing this code as pre-configured Logstash pipeline you could just integrate into your Elastic Stack. This would make it quite easy to build upon it and e.g. use similar rules for other logs or parse the logs even more to get more detailed info from it. If that’s something you’d like to see, just let Dirk and me know. I’m afraid, I could miss the info because I’m not so often in this board (by now).

ehelms · April 28, 2021, 1:19am

Marking this as solved as we have for at least two releases switched to log based output and modified the format to the following:

2021-04-28 01:10:27 [NOTICE] [root] Loading installer configuration. This will take some time.
2021-04-28 01:10:30 [NOTICE] [root] Running installer with log based terminal output at level NOTICE.
2021-04-28 01:10:30 [NOTICE] [root] Use -l to set the terminal output log level to ERROR, WARN, NOTICE, INFO, or DEBUG. See --full-help for definitions.
Insufficient memory for tuning size
Tuning profile 'default' requires at least 8 GB of memory and 1 CPU cores
[root@certs vagrant]# foreman-installer --disable-system-checks
2021-04-28 01:10:58 [NOTICE] [root] Loading installer configuration. This will take some time.
2021-04-28 01:11:01 [NOTICE] [root] Running installer with log based terminal output at level NOTICE.
2021-04-28 01:11:01 [NOTICE] [root] Use -l to set the terminal output log level to ERROR, WARN, NOTICE, INFO, or DEBUG. See --full-help for definitions.
2021-04-28 01:11:06 [WARN  ] [pre] Skipping system checks.
2021-04-28 01:11:06 [WARN  ] [pre] Skipping system checks.
2021-04-28 01:11:09 [NOTICE] [configure] Starting system configuration.
2021-04-28 01:11:17 [NOTICE] [configure] 250 configuration steps out of 1947 steps complete.
2021-04-28 01:11:27 [NOTICE] [configure] 500 configuration steps out of 1948 steps complete.
2021-04-28 01:11:27 [NOTICE] [configure] 750 configuration steps out of 1951 steps complete.
2021-04-28 01:11:28 [NOTICE] [configure] 1000 configuration steps out of 1954 steps complete.
2021-04-28 01:11:29 [NOTICE] [configure] 1250 configuration steps out of 1958 steps complete.
2021-04-28 01:16:27 [NOTICE] [configure] 1500 configuration steps out of 1959 steps complete.
2021-04-28 01:18:24 [NOTICE] [configure] 1750 configuration steps out of 1959 steps complete.
2021-04-28 01:18:31 [NOTICE] [configure] System configuration has finished.

Marek_Hulan · April 29, 2021, 1:18pm

Is that the final version? I though it was a midterm solution, I think we can and should do better.

ehelms · April 29, 2021, 1:50pm

That was the goal and intent of this RFC. If you feel there are additional changes or another design please do re-open this RFC with your ideas, or open a Redmine issue or a new RFC.

Marek_Hulan · April 29, 2021, 6:38pm

And there was this suggestion RFC: Installer Output and Logging Format - #4 by ehelms

I don’t think we have to show the amount of resources as it’s impossible to know before the actual run, but the key is to split the installer work to stages. So one knows how far the installer got and which stage failed.

ehelms · April 29, 2021, 7:46pm

We have stages in the current solution. There are some actions without stages that get marked as root. In a standard run, such as my ‘solution’ post, you can see: root, pre, configure.

Is it a different format for the output you’re after? To see more or different stages? To have the ones labeled as root with some stage name?

Marek_Hulan · April 30, 2021, 9:37am

Yes, I want to see the counter of stages. When the installer is still in progress, e.g. like this:

[root@certs vagrant]# foreman-installer --disable-system-checks
2021-04-28 01:10:58 [NOTICE] [root] Loading installer configuration. This will take some time.
2021-04-28 01:11:01 [NOTICE] [root] Running installer with log based terminal output at level NOTICE.
2021-04-28 01:11:01 [NOTICE] [root] Use -l to set the terminal output log level to ERROR, WARN, NOTICE, INFO, or DEBUG. See --full-help for definitions.
2021-04-28 01:11:06 [WARN  ] [pre] Skipping system checks.
2021-04-28 01:11:06 [WARN  ] [pre] Skipping system checks.
2021-04-28 01:11:09 [NOTICE] [configure] Starting system configuration.
2021-04-28 01:11:17 [NOTICE] [configure] 250 configuration steps out of 1947 steps complete.
2021-04-28 01:11:27 [NOTICE] [configure] 500 configuration steps out of 1948 steps complete.
2021-04-28 01:11:27 [NOTICE] [configure] 750 configuration steps out of 1951 steps complete.

I have no idea how many more stages will follow. Also pre and configure don’t tell me much, the stage should have better name that explains what it’s doing. I’d even say I’m not that interested in stage details, most of the line is repeated several times. I’d live each stage be represented with one line that’s updated until it’s done. Things like log level, full date IMHO belong to the installer log, not installer STDOUT.

EDIT: I dared to unmark the previous post as solution