The Road to Making Puppet Optional

tbrisker · March 29, 2020, 7:18pm

Following up on the announcement by @Marek_Hulan, responses to surveys and discussions with various people, this RFC outlines the technical aspects of making Puppet an optional part of Foreman and the proposed changes.

This document tries to capture the outcomes of all the discussions we have had so far on the matter, but we might have missed some things, or made wrong assumptions and choices on the planned approach - so input from more people is greatly appreciated!
I would like to thank everyone who has been involved in the discussion so far for their valuable feedback.

The two guiding principles while creating this proposal were:

Users of Puppet should not see a deterioration in their workflows - ideally, some workflows would even see an improvement.
Users who don’t use Puppet will have a simpler, easier to use experience with Foreman.

With Puppet 5 reaching EOL in November 2020, we are also planning on taking advantage of this change to clean up code used for supporting older versions of Puppet. We have already started on this effort with the removal of Smart Variables in 2.0, but there is much more that can still be improved.

Since this is a very large change, it will take several releases to achieve, and the outcome of this work will result in Foreman 3.0. All significant breaking changes will only be done as part of the major release change, so users will have enough time to prepare and plan accordingly.

We have identified several main workflows or integration points with Puppet. This is how we plan on addressing each one:

Facts and Reports
This is the most common use case for all configuration management solutions, and a core use case of Foreman. Being able to import and parse reports and facts from multiple sources should be a part of Foreman core.
Puppet fact and report importing and parsing will remain in Foreman core.
Some other common sources will be migrated to core: initially, Ansible and Subscription Manager seem to be the most common ones according to preliminary data from the community survey. Others may be added as well if there is enough demand.
Fact parsing of facts from legacy Facter versions (2 or older) will be dropped, and the interface between fact importers and parsers will be better defined. Perhaps some of the tasks could be moved to background processing using dynflow.
The plugin interface will continue to be maintained, to allow for adding additional sources from plugins.
All fact and report import actions should be done via a smart proxy. The proxy will handle the authentication to Foreman as well as receiving the input from the external services providing them. We will also look into possibility of receiving facts and reports directly from the Puppet server instead of using the existing scripts which require additional setup for external Puppet servers.
There is some ongoing discussion regarding which facts should be parsed and how. I expect the outcomes of that discussion to also affect the changes done in this regard.
ENC (External Node Classifier) and related workflows
This refers to a set of association between hosts and Puppet entities - namely, Puppet environments, Puppet classes and Puppet class parameters. While these are commonly used, they are used less often than facts and reports, and add a lot of complexity for users who do not use Puppet or use Puppet only as an information source for Foreman while managing its ENC functionality elsewhere (e.g. using hiera).
This functionality will be extracted to a new plugin, which will be installable without losing existing data where it exists.
As part of the extraction, we will try to incorporate some improvements as well - for example, importing parameter types for smart class parameters when possible. This will also be a good chance to consider how we can improve the workflows around this area, as well as potentially adding integrations with other tools in the Puppet ecosystem where appropriate.
Puppet module content management with Katello
This workflow is not very common, as the Puppet ecosystem has other common tools for managing modules (such as r10k).
Since Pulp 3 does not support the Puppet content type, support for the Puppet module content type has been officially deprecated in Katello 3.15 and will be removed in a future release (likely, Katello 4.0).
Puppet CA certificate autosigning for new hosts
There are currently two approaches for handling autosigning of newly provisioned hosts by the Puppet CA.
The hostname whitelisting approach, which is less secure and requires modifying Puppet configuration files directly, will no longer be supported.
The token-based whitelisting approach will continue to be supported as part of Foreman core, but the cert-based implementation that is used on Puppet 5 or older will also be removed in favor of the API-based implementation that is in use for Puppet 6.
Puppet Run
There are currently multiple methods of executing one-time Puppet agent runs, which are difficult to set up properly, and are not very widely used. This has already been discussed over a year ago.
The approach outlined in that discussion is the one we plan to pursue - remove the various Puppet run providers in favor of the existing Remote Execution template as the recommended method of triggering a Puppet agent run on a host.
Installer setting up Puppet Server and CA
While it is currently possible to optionally disable automatic setup of Puppet Server and CA by the installer, in a Foreman installation without Katello we depend on the Puppet CA to provide and manage some certificates used by Foreman and the smart proxies.
In the shorter term, there is ongoing discussion regarding how to better handle certificates in general, including dropping this dependency. Once that work is done, Foreman will no longer require a Puppet CA be set up for installation, and making a Puppet-less installation scenario will be easier.
In the longer term, since Foreman is usually installed in a brown-field environment, we may consider no longer managing external services (such as Puppet Server and CA) using the Foreman Installer and rather only providing the installer with credentials required for communicating with such. That, however, is a separate discussion that will not be addressed as part of the current effort.
Dashboard widgets
There are multiple widgets on the dashboard related to configuration management status and reports. When using more than one solution, some widgets are multiplied while providing little value.
The status related widgets will be modified to provide information regarding all host statuses, not just configuration status.
The configuration report related widgets will be modified to display consolidated information regarding all sources, and will only be displayed if there are reports viewable by the current user - leading to a cleaner dashboard for users who don’t import configuration reports to Foreman or lack permissions to see such.
Host detail page
The host view today consists mostly of two large graphs showing Puppet run statistics. These are not very useful, take a large amount of “real estate” on the screen, rely on an old graphing library we want to remove and are blank for hosts that don’t use Puppet.
This page will be redesigned in a modular way, that will display more complete information (such as details currently only viewable in the edit page) and will allow easy extensibility for plugins that wish to provide additional information. As a side effect, the Content Host view from Katello might be incorporated into this page, bringing us one step closer to the full unification of Hosts and Content Hosts.
Settings
There are currently multiple settings under the “Puppet” category that are not Puppet-specific (such as fact-related ones), and there are settings in other categories that refer to Puppet even though they may (or may not) affect other configuration management options.
The settings will be re-arranged and better described, and in some cases settings that aren’t useful may be removed.
Statistics and Trends
Most of these currently rely on Puppet related information - either fact data or ENC configuration. Additionally, they are not heavily used, as there are much better solutions for monitoring real-time statistics and trends of hosts out there.
These pages will be extracted to a new plugin, where it will be easier to improve them, and perhaps integrate them with other monitoring solutions for a better overview. This will also provide us with a good opportunity of understanding what is required to extract core functionality into a plugin, before attempting the much more complicated Puppet ENC functionality extraction.

What now?

Please share any feedback you have regarding the workflows and direction outlined above. Did we miss something? Is there some opportunity for improvements? Are we making a huge mistake? Do you like where this is going?

The whole effort will be tracked on Redmine, where it will be broken down into smaller chunks and tasks. We will gradually start implementing these changes over the next several Foreman releases. We’ll try to update here as we go along, and perhaps open additional RFCs if needed on specific parts.

Want to help?

There’s plenty of work to do! While there are a few core developers that are already planning to work on this, any help is appreciated. Please reach out to me if you want to help this effort at any stage. There will also be tasks that do not require coding skills, such as testing changes and updating documentation, so everyone is welcome to take part.

TimoGoebel · March 30, 2020, 9:03am

Where are the current culprits? In my opinion, the interface is quite functional right now.

What do you want to gain with this? Don’t they have quite a happy life in their respective plugins?

Finally, I’ve been complaining for years that these graphs need to change. Big

tbrisker · March 30, 2020, 11:38am

IIRC this suggestion was raised by @Marek_Hulan, so I’ll let him elaborate, but the general gist was that the importer should just dump a json blob into the DB and the parser will handle everything regarding extracting information from it.

The idea here is to allow Foreman to gather information from multiple sources out of the box, without requiring additional plugins that add complexity which might not be useful for everyone - for example, puppet class/ansible role assignment to hosts and hostgroups, variable handling etc.
Additionally, if we have multiple parsers in core, there might be a chance to combine logic that is duplicated between the different parsers.

lzap · March 30, 2020, 12:26pm

Tomer, I am interested in being involved in fact and/or report parsing.

lzap · March 30, 2020, 12:32pm

This is being discussed in another thread: RFC: Drop parsing NIC interfaces

Don’t mind the thread name, it started as an idea to remove parsing of NIC interfaces and it evolved into ideas on how to improve fact parsing in general. In short, make our parsing code more robust, find a common fact model across various sources, extend facter with an external facts providing useful NIC relationships.

For reports, there is another thread: RFC: Optimize storing reports

In short, refactoring of how we store reports in more update-friendly way.

These are the two items which are on my mind as the top performance issues we face on large scale deployments, that’s why I am involved.

TimoGoebel · March 30, 2020, 1:02pm

Isn’t this already the case?
I’m asking because we might want to gather facts or reports from Compute Resources, e.g. for a VM import and it would be good to do the heavy lifting in core as the proxy doesn’t know anything about compute resources.

tbrisker · March 30, 2020, 2:10pm

Good point. Gathering facts from CRs is definitely a great idea! The intention is that the heavy lifting (i.e. processing) is still done in core (or perhaps in a background task), the proxy is there to handle communications between foreman and external services. I know right now CRs all communicate directly to Foreman but perhaps we should ask ourselves if that is correct.
One way I can see this is a “Facts and Reports” module in a proxy that defines what sources it accepts and passes them along to the Foreman server, so you don’t have to have your external service authenticated directly with foreman for it to work.

ekohl · April 2, 2020, 1:25pm

Currently the Puppet integration scripts (ENC, reports) call directly to Foreman. My idea was to provide a Proxy module where Puppet can call. So currently we have:

Foreman -> Proxy -> Puppet
Puppet -> Foreman

The Proxy already has a setting where the Foreman server is located. It has the credentials to call back. Then we only need to ensure Puppet can talk to the Proxy. Another argument is that I thought it’d be easier to use separate certificates between Proxy <-> Puppet. Perhaps set up a minimal Proxy instance using the Puppet certificates that knows how to securely call back to Foreman.

I haven’t thought it through too much so I’d like some feedback on this.

TimoGoebel · April 2, 2020, 1:33pm

I think this actually makes sense. The enc/report scripts currently hijack the credentials from the smart-proxy. It would be cleaner if they sent their data to smart-proxy.
How would you handle the case if the smart-proxy is up and running and is receiving data but it cannot reach Foreman. Would you make this a synchronous action? Would Foreman poll the smart-proxy for data?

ekohl · April 2, 2020, 3:52pm

That was why I was thinking about a very small service that can actually be multi process/threaded. A blocking API is much easier to deal with. If it runs as the foreman-proxy user it should still have the credentials and config.

ehelms · April 2, 2020, 8:50pm

Do I read this concept correctly that client puppet agents would talk to the smart-proxy instead of Foreman? If yes, is this effectively using the smart-proxy as a reverse proxy?

ekohl · April 2, 2020, 9:09pm

In a way yes. However, I’m not entirely sure how to do the auth. In the Katello setup you would have Foreman and Foreman Proxy both using the Katello default CA. Then if you have a Puppetserver, you’d ideally use the Puppet CA certificates because it allows using the built in HTTP report processor instead of our custom one.

One possible way is to let Foreman Proxy bind on an additional port and serve the Puppet CA. Another is to use Server Name Indication and use separate DNS names. A third I’m not sure that will work is make sure the Puppet CA itself is signed by the Katello root so it’s the same hierarchy.

TimoGoebel · April 3, 2020, 7:26am

Why is that? Isn’t the Katello CA part of the trusted roots of a host? The proxy would then only need to trust the PuppetCA (client auth) and could forward the data to Foreman with whatever CA.

I believe that depends on your point of view. Currently the smart-proxy has to be installed on the same host as the puppetserver so the scripts can use the credentials of the smart-proxy to send data to Foreman. If we send the data via smart-proxy we could decouple this.
If we don’t need to do any processing or normalization, it’s a reverse proxy. Correct. Would we want to do any post-processing on the proxy?

Marek_Hulan · April 6, 2020, 6:55am

The current issue is, the interface defined in FactParser doesn’t well define, what values are expected and how they create objects. E.g. operating systems - it’s not clear what values parsers should derive from facts, in the past, some facters created RHEL as RedHat, some as RHEL. The same aplies to values format e.g. ram. We have a comment we expect the value in MB but I think it would be good to have a proper documentation for the full parser interface.

lzap · April 8, 2020, 1:24pm

This is what I would like to tackle in the fact parser rewrite - to find a common model. Basically to follow facter3 with some changes (facter does some things incorrectly - e.g. reporting boot uptime instead of boot time which does not change every single fact upload). The CFM would not only define which facts should be reported but also contents of these facts, I’d definitely kill all those “123 MiB” crazy formats which are maybe human readable but that’s totally wrong. It should be reported in bytes, machine readable, it’s our UI job to format them for human beings!

ehelms · April 8, 2020, 3:24pm

I wanted to see if I understood the outcome of this correctly. I’ll phrase things as a series of questions:

If I have Foreman without the Puppet plugin, I can “register” hosts to Foreman via Puppet still?
- If yes, does this assume I have a smart proxy present with the Puppet CA feature?
- If yes, is the same true for other host sources (or does it require their plugins)?
- Ansible
- subscription-manager
- Chef
- Salt

ekohl · April 8, 2020, 3:39pm

This has never been true. You only need the Puppet CA feature if you want to provision and deprovision.

Registration happens via fact upload. Currently we use the Puppet feature as authentication and authorization. The Puppetmaster does a POST to the facts endpoint using its certificate. In the certificate is the common name of the smart-proxy. Foreman checks whether this smart proxy exists and has the Puppet feature and then allows uploading facts. Reports happen using the same authentication.

I believe foreman_ansible uploads facts and reports using the same way and abuse the Puppet feature being present to authenticate.

In theory you don’t even need the Puppet feature for that since you can use username/password with sufficient credentials instead.

github.com

theforeman/foreman/blob/develop/app/controllers/concerns/foreman/controller/smart_proxy_auth.rb

require 'resolv'
require 'uri'

module Foreman::Controller::SmartProxyAuth
  extend ActiveSupport::Concern

  module ClassMethods
    def add_smart_proxy_filters(actions, options = {})
      skip_before_action :require_login, :check_user_enabled, :only => actions, :raise => false
      skip_before_action :authorize, :only => actions
      skip_before_action :verify_authenticity_token, :only => actions
      skip_before_action :set_taxonomy, :only => actions, :raise => false
      skip_before_action :session_expiry, :update_activity_time, :only => actions
      before_action(:only => actions) { require_smart_proxy_or_login(options[:features]) }
      attr_reader :detected_proxy
    end
  end

  private

This file has been truncated. show original

TimoGoebel · April 8, 2020, 4:11pm

No abuse is needed here. We already have the concept of FactImporters. They can register a smart proxy feature that is allowed to send facts.

ehelms · April 8, 2020, 4:12pm

Does this still work with the Puppet master and smart-proxy are on separate hosts?

Do I have these “registration” workflows correct?

So for Puppet (requires smart-proxy with Puppet feature):

Puppet agent → Puppet master (impersonating smart proxy) → Foreman

For Ansible (requires foreman_ansible + smart-proxy with Puppet feature):

Ansible on host → Smart Proxy → Foreman

For subscription-manager (requires Katello):

sub-man on host → Foreman

ekohl · April 8, 2020, 5:19pm

This depends on how you configure it. The installer sets it up for REX:

github.com

theforeman/puppet-foreman_proxy/blob/b2279e386fb5a53254a1ced6967442ade92affb2/manifests/plugin/ansible.pp#L50-L60


      
          file {"${foreman_proxy::config_dir}/ansible.cfg":
            ensure  => file,
            content => template('foreman_proxy/plugin/ansible.cfg.erb'),
            owner   => 'root',
            group   => $::foreman_proxy::user,
            mode    => '0640',
          }
          ~> file { "${::foreman_proxy::dir}/.ansible.cfg":
            ensure => link,
            target => "${foreman_proxy::config_dir}/ansible.cfg",
          }

github.com

theforeman/puppet-foreman_proxy/blob/b2279e386fb5a53254a1ced6967442ade92affb2/templates/plugin/ansible.cfg.erb#L8-L12


      
          [callback_foreman]
          url = <%= @foreman_url %>
          ssl_cert = <%= @foreman_ssl_cert %>
          ssl_key = <%= @foreman_ssl_key %>
          verify_certs = <%= @foreman_ssl_ca %>

I don’t know how many other users have set up the callback on non-smart proxies.