Foreman Discovery is not ideal for both interactive and non-interactive workflows. It works the best when a host is discovered as is and with this “vanilla” configuration it is provisioned in a way that a hostgroup is assigned, no network configuration is changed. Interactive workflow via Edit Host form does not work great. If you think that auto-provisioning is the answer, well, there is no way to change NICs via that workflow. It is not great.
Discovery Image has also complicated design. Puppet Facter is the tool that uploads facts, the idea behind it was to leverage Foreman’s Puppet parsing capabilities to discover hosts as STI records. Discovered nodes also run smart-proxy only to provide its API with a plugin that can perform reboot and kexec. And there’s the TUI for PXE-less workflows that has a completely different codebase.
I did a step back over the weekend and with a pen and pencil I figured out what I think is a much superior user experience for the future. It is a massive redesign, however, I want to build on components we already have. Here are the requirements for the new solution:
- New solution must provide great experience for both interactive (pick your node, assign hostgroup or create brand new host from scratch) and automatic provisioning (node is provisioned as soon as it gets discovered, or via a command).
- Users must be able to completely reconfigure host configuration, including NIC changes (changing subnets, setting up VLANs, bonds, bridges).
- Reported facts must be easy to modify or add (this is currently pretty tough through Facter’s custom facts).
- Communication between nodes and smart-proxies must be simple and secure by default.
The solution I am proposing is all-new because I strongly believe things must change from the ground up. Discovered hosts must not be STI “unmanaged” hosts anymore, provisioning must not be an act of “editing a host” and also I think that the workflow I am about to show you can be useful also for registering existing hosts. Therefore, my solution is not called discovery anymore: enter the world of host pre-registration. Oh, it’s not a plugin anymore, you will see why in a bit.
There’s a new model class called preregistration which represents some hardware or resource that can be created later. Although these could be other things than just discovered servers (hardware switches or even virtual resources like subnets or domains), let’s start small. In its simplest form, preregistration has a timestamp and list of facts.
Lesson learned from the past: Discovered hosts do have names that are by default in the form of macAABBCCDDEEFF
but all our users do eventually rename the hosts. Therefore I am intentionally keeping name
out of the preregistrations
table. It’s useless, MAC address is a fact. In the UI, users should be able to change what columns to see on the index page and since it will be extremely easy to create their own facts, it can be even their very own fact like servertype
with values like big-9843294
or tiny-3243234
.
Registration has 1:1 association to hosts table to track if it was utilized or not. Registration which are in the wait queue to be provisioned has association set to nil
, obviously. This way, hosts keep their registration history (when, what facts) and they can also work as a “baseline” fact source for hosts that have no fact source available (no rhsm, no puppet, no ansible).
From the UI perspective, there is just an index page of registrations, detail page showing list of facts and searching. Each registration also has a button called “Process”, more about it later. In the first version, there is no interactive provisioning possible - everything must be done with auto-provisioning (more about it later). The plan is that once we get to the New Host form rewrite, it will have an option to create a brand new host based on a registration - users will be able to pick some values from facts (e.g. MAC address or subnet). I would like to focus on seamless auto-provisioning first. But first, let’s talk about the discovery process.
After a LOT of thinking, I have landed on the following solution for discovery: pull-based HTTPS polling via a Python script. Polling is very simple not only to implement, but it also can utilize Foreman’s strong feature - templating system. We can actually build everything just around our unattended endpoint. Although polling is not always preferred for scalability, for discovery it is actually fine - there’s usually not a whole lot of nodes and intervals can be set longer when necessary - initiation of provisioning does not have to be instant. Finally, it works great with the idea that fact gathering should be a single Python script that can be edited - the script itself can be a Foreman template!
Now, we are Rubyists so why Python? This is a practical reason. In Red Hat systems, Python is always present - there is what’s called a platform python which you cannot even uninstall easily. Second, the discovery process can also be integrated with just Anaconda via a simple %pre
script where Bash or Python are the only two options. It could be a shell, but battling JSON is not what you want to do in Bash.
So, here is how it works. Foreman Discovery Image is built the same way as we do today but there is no Ruby, no Facter, no Smart Proxy, no TUI. Just the OS and a systemd service that starts up. Before we download any bit, let’s talk security. The protocol of choice is HTTPS, so discovery must have CA certificate available in order to verify the server. This can be done by attaching a USB stick with a properly formatted filesystem (FS label) and a filename, second option would be to put X509 fingerprint to the kernel command line - the certificate would be downloaded from the server first and fingerprint tested prior to any communication.
Then the first request would be GET https://smartproxy/unattended/preregister_script?mac=MAC1,MAC2,MAC3
. The goal of this request is to render global “Preregister Script” template. Foreman would ship a simple Python script that will print a JSON to the standard output - facts. Only a few are really needed, users could define their own when needed. IP and MAC addresses (REMOTE_IP
) are available in the rendering context so users can actually provide different scripts for different hosts (e.g. different subnets).
Lesson learned from the past: Discovered nodes gather facts over and over again every time they report facts back to the server every 15 minutes by default. This is not needed, facts should be only evaluated once, cached and sent unchanged. There is no point in sending the amount of free memory or how much free disk space is at the moment. This is really only needed once, in case of hot-swap users should reboot.
After the script is executed and facts are gathered, another request is made, this time it’s POST to https://smartproxy/unattended/preregister
with facts in the request body. At this point, preregistration record is created and saved, new webhook called new_preregistration
is fired (more about them later) and HTTP 202 (Accepted) code is returned.
The sending script at this point is waiting in a loop, it performs the same request every few minutes unless HTTP 200 (Ok) is returned. This is the polling I was talking about - looks like a dumb design but it is a great fit for discovery.
At this point, the registration appears in the UI/CLI, users can list them, show them, and if they want to actually initiate provisioning they can click on the “Process” button. The only purpose of this button is to support workflows that are called “semi-automated auto-provisioning”. That’s the “auto-provision” button we have today - host may proceed with discovery according to Discovery Rules. What it does in the new design is simple: it just fires another webhook called process_preregistration
.
The last bit of the puzzle is Foreman Webhooks plugin and I think it is pretty obvious at this point. My requirement for the solution is to give our users much-required flexibility of creating hosts based on discovered facts. Only if we had a good and stable API they could use. Or CLI maybe? Wait! Foreman has an API and a CLI.
Users can create webhooks written either in their stack of choice or via our Shellhooks plugin (shell script) that would do the host creation. In the input parameters for those webhooks there are registration objects: all facts that were gathered. All the rest is something that our users can do pretty easily and there are no limitations - they can build pretty much any logic into these webhooks.
Lesson learned from the past: Taxonomy and discovery is hard - every shop has different requirements. That’s why designing the discovery process as open and flexible as possible is key. Webhooks can be built in a way that depending on the input (e.g. subnet) different sub-workflows in different departments can be called. We would ship some examples for Shellhooks.
In the current design, Discovery Rules can be used to assign hostgroups based on fact search conditions. The same but much more can be done via Webhooks. Some might argue that previously users could create their own rules in the UI/API/CLI while in the new design they would need to edit some shell scripts or build their own ruleset in their own stack. While I think the flexibility over UI pays off in the long-term, we could build a similar Preregistration Rules table that would instead of associating hostgroups would fire particular Webhooks. Alternatively, users could deploy some kind of UI themselves if needed.
As I said, interactive provisioning would be delivered with the New Host redesign which is coming soon (this year perhaps). But Discovery has always been strongest in the non-interactive mode. And that would be the focus for the first version.
For PXE-less mode, I would like to drop the whole TUI and replace it with just a few questions asked via the Python script when no network connection is found. Again, the ability for users to customize such script in the way they want is the way to go forward. Some might actually offer a few options: provision as database, web server or load-balancer? Hit enter to create a host.
There you have it, this feels really flexible and easy to understand and work with. Tell me what you think!