RFC: Tracking provisioning progress of hosts
Problem
After host creation, users don’t have an easy way to track provisioning progress from the UI. They have to wait until the callback home happens (TODO Link), which sometimes doesn’t occur, leaving users uncertain about what and where went wrong.
Proposal
Introduce a new API endpoint allowing callbacks from the provisioned machine to the Foreman and report the provisioning steps (PS).
On the Host UI, we can then show the provisioning progress like this:
Implementation - design
I’ve been thinking about two variants of the reporting: simple and detailed.
Simple
We report only the name of the step, label, date/time, and priority. It’s reporting that “something happened,” but we don’t know the results.
Detailed
We report all of the above, plus we can introduce a status. With this, we could report when the step started, ended, or failed.
This would benefit users, but it also means reporting each step twice (start and end), generating quite a lot of requests and log noise. Performance is a drawback of this implementation.
Implementation - code
Action items
-
Create an endpoint (or customize an existing one) to track the provisioning steps.
-
Authentication logic will be the same as that of existing unattended endpoints.
-
Update templates: Add callbacks from the kickstart template and others.
-
Create a template helper to generate the code.
-
Smart Proxy is doing a lot of stuff; we could introduce code helpers for reporting to the Foreman.
-
Implement FirstBoot systemd service; see RFC - Systemd first boot service for host provisioning.
Defining steps
Strict
Foreman and its plugins define all provisioning steps. Reporting an unknown PS will result in an error, not interrupting the provisioning process.
Loose
We could scratch all of that above and just have the reporting “free”.
We allow users to report whatever they want, without strict definitions of statuses and priorities.
With this, we can list all reported steps by date on the Host page and group them by the label. Less code for us, and more freedom for the users.
For both cases above, Foreman will allow reporting status multiple times per host. This can be helpful in case of an error that is causing an endless reboot loop. When users see several steps reported multiple times, they can intervene faster and fix the problem.
Provisioning Step parameters
{
"name": "Kickstart download",
"label": "get_ks",
"priority": 0,
"status": ["init", "done", "failed"],
"logs": "..."
}
Looking for inputs
- What implementation design do you prefer?
- Do you have any comments on the code implementation?
- Any other ideas for improvements?
- What are the user or customer expectations, and what might they need?