Greetings!
Over the past couple days I've been working on putting together a broad design document for improving the remote execution capabilities of Foreman. Currently we support running puppet through mcollective, but nothing beyond that. The goal of this discussion is to figure out a plan for allowing management of machines in Foreman using a tool like mcollective.
Although I mention mcollective throughout the document, that doesn't mean it's the only tool we should consider. It does have the benefit of having a reasonable PKI scheme, is very extensible, has a solid communication model, and is already widely accepted in the Puppet community.
Here's the document in markdown for comment, but it's also on gist [1], which is much easier for reading:
Implementation
Architectural overview
Mcollective is the most generic and flexible solution for running controlled, selective commands and jobs against groups of hosts. It uses a plugin-based architecture which allows it to run virtually any task remotely without giving users the kind of unfettered access to hosts like Func, polysh, or rundeck allow. Users that want to allow administrators to run commands against large swaths of hosts in a free-form manner, like running yum upgrade
through a shell across all app servers, can do so, but more controlled jobs, like enabling or disabled the puppet agent across many hosts solely using the puppet-agent mcollective plugin is also possible. Remote execution capabilities will be handled through the proxy since users are likely to run the mcollective master and ActiveMQ on each puppet master.
Command execution and storage
- Actions will be stored in the database by namespace and arguments.
- The mcollective plugin name will provide the namespace.
- The arguments will be
- Remote actions and metadata about the hosts they've been run across should be stored.
- The metadata (i.e. hostgroup) of hosts the command ran across.
- All the hosts that complied with that metadata at the time of command running.
- Actions that have been run in the past should be able to be replayed against hosts with different attributes.
- If an action was taken for all the machines in the
app
hostgroup, it should be possible to run that same command on all the machines in thedb-master
hostgroup. - Fact-based matching for target systems.
- If an action was taken for all the machines in the
- All the commands that have been run on a single system, a hostgroup, single/multiple facts, or globally, should be available along with that object.
- If a command was run based on a fact, that fact's page will have the command listed.
- Commands that have already been run can be scheduled based on their namespace and arguments (this is reliant upon having a queue with support for pushing future-dated tasks onto it)
User interface
- The user will have ACL's based on the normal permissions and ancestry hierarchy
- The output of commands will be parsed and presented back to the user
- Long polling happens to get the status of execution on host(s)
Single host
- Hosts assigned to a puppet master with remote execution capabilities will have an additional icon in the section with "Build" and "Delete"
Remote management process:
- User clicks the remote management button on the host page
- User is shown all the plugins (namespaces) they have access to use
- User chooses which management namespace (plugin) they will use
- The user enters arguments or reloads arguments from history
Multiple hosts from the listing page
- Hosts can be filtered on the host listing page and then be selected to have remote management tasks run on them
- Remote execution tasks need to be orchestrated across different pupetmasters since all the hosts in the listing post-filter may not all talk to the same master
- How do we handle hosts that are on a master that don't have mcollective configured? Just ignore them? Or report back to the user that they didn't have the command run?
"Remote execution command center"
Standalone page
-
This page will accessible through the dropdown (like Puppet CA's) that's next to each proxy on the proxy listing page
-
It'll load plugins that are available on that proxy
- Plugins will rely on locations & organizations for filtering based on ACL
- Roles can be used to control access to plugins globally
- This needs more thought
-
(not-MVP) Install plugins from Github or a tarball onto a proxy
API & CLI
General API
- Retrieve the status of all running remote execution jobs
- Retrieve available namespaces
- Retrieve all future-dated tasks
- Potentially based on date range
- Retrieve a history of tasks
- Date range
- Task namespace
- Filter that caused hosts to be selected
Task API
- Check for task status based on ID (long-polling for the CLI)
- Returns:
- Current progress
- Successfully completed hosts
- Failed hosts
- Submit new tasks - this might be best to handle through mcollective directly
CLI
- All API methods should be exposed
- Long-polling for task completion
- Auto-completion for available hostgroups and facts
- Search through past commands and re-run them
Let me know if you've got any questions. I'm looking forward to your feedback!
-Sam