Effectively Integrating Hiera, Git, Foreman, and Puppet?

Hello!

I’m trying to green field a Puppet setup and compartmentalize some things so that we can implement a change management process through git. I’m running into a couple of roadblocks, and was hoping that the Foreman community could help with that, since Foreman is currently managing our bare metal provisioning and is hooked into our Puppet infrastructure.

A side note before the wall of text - I’m happy to be pointed towards a consultant or group that could do consulting on a complex problem such as this.

Goals

  • Use each tool for where it is most effective
  • Define a hierarchy of attributes such that each level inherits config from the previous level

Impetus

In my exploration of Hiera and Puppet, I have been able to break out my separation of concerns into the following:

  • Puppet Modules define how a system gets to a certain state
  • Hiera Data defines what state the system should be in
  • Foreman provisions systems and pre-configures them for Puppet usage

With this model, if we want to change the configuration for a particular purpose, we would simply branch the hiera data repo, modify it, make a pull request, and merge it back in. It also means that the given state of a system is defined only through the combined Hiera data.

Example of Intent

Assume a hierarchy of configuration:

infrastructure_server
└── openstack
    ├── compute
    │   └── dhcp_agent
    └── controller

A server that needs to be an OpenStack controller node would inherit all of the configuration from the “OpenStack” configuration as well as the configuration from “infrastructure_server” (and potentially any additional common.yaml file that might exist).

Design

It seems like Foreman has the perfect solution for that - use Host Groups. However, we don't want to use the Class and Smart Parameter features of Foreman as I can't seem to find any information on auditing changes to those and I do not see a mechanism for storing that configuration in Git.

As such, it seems like using Foreman to define Host Groups and manage what systems are part of which group, but then use Hiera for the data for what configuration those Host Groups have is appropriate.

But, Hiera seems to have no way to define an arbitrary hierarchy, only existing ones, and to further complicate matters, if you do use it programmatically, you’re expected to define facts that build your hierarchy for you. Since Foreman provides the host group as a single value with '/'s in it, there doesn’t seem to be a way to mesh the Hiera config and the Host Group hierarchy.

Possible Solutions

Flat folder structure

Since Foreman is providing the host group, we could generate files based on the host group name, which would leave us with a Hiera config that looks like this:
hierarchy:
  - name: "Host Group Data"
    path: "nodes/host_group/%{hostgroup}.yaml"

And then the inside of the host_group folder would contain:

infrastructure_server.yaml
infrastructure_server_openstack.yaml
infrastructure_server_openstack_compute.yaml
infrastructure_server_openstack_compute_dhcp_agent.yaml
infrastructure_server_openstack_controller.yaml

However, this removes one of the main benefits of the host group hierarchy and we would need to then have some automation in place to copy changes from parents in the hierarchy to their children, and thus the far children of the host group tree would have long and unwieldy configs.

Custom Facts

Build an automation intermediary that, whenever host group information is changed, sets custom facts like "group_1", "group_2", and "group_3" so that the Hiera config looks like the following:
hierarchy:
	- "%{::group_1}/%{::group_2}/%{::group_3}/%{::group_4}.yaml"
	- "%{::group_1}/%{::group_2}/%{::group_3}.yaml"
	- "%{::group_1}/%{::group_2}.yaml"
	- "%{::group_1}.yaml"
	- "common.yaml"

which would lead to a folder structure like:

infrastructure_server/openstack/compute/dhcp_agent.yaml
infrastructure_server/openstack/compute.yaml
infrastructure_server/openstack.yaml
infrastructure_server.yaml

While this seems like it might work, it also feels extremely prone to breakage.

Conclusion

After searching the Internet, I can't seem to find anyone who has tried something like this. I'd be happy to know if there's a simple solution to this that handles the goals as laid out above (someone had mentioned profiles and roles still, but that doesn't really seem to effectively encompass the host group hierarchy problem).

Any help or suggestions would be appreciated!

I just saw that this post was originally in the Communities section. I think it might be best in the Support section, so have moved over here and hopefully you’ll get some answers to your queries.

1 Like

What most of my customers (myself being one of the mentioned consultants :wink: ) are using is facts which define the hierarchy in hiera, custom ones mostly defined by parameters in Foreman, or directly the parameters. In most environments this type of information is not suited for the inherited structure of the Hostgroups.

For example a virtual and physical server with the same function need different settings, but not enough to replicate Hostgroups. Same goes for production and testing or similar scenarios.

I see also an increasing trend to use lookup and include function to get classes from hiera instead of using an ENC or node definition. I personally dislike this for making separation of Puppet and Hiera less sharp and prefer the Roles-Profiles-Pattern with only a role assigned to a system via Foreman as ENC or Puppet node definition.

So this perhaps does not help you with the way you want to solve your problem, but it perhaps explains why you find noone going the same direction.

4 Likes

Hello again!

After discussion with numerous people, I was able to devise a solution to this that is true to my original intent, doesn't add unnecessary automation, and should be scalable and modifiable if necessary. I want to thank Nick Maludy for filling in a couple of gaps in my Puppet/Hiera knowledge and a coworker that wishes to remain nameless for writing some array munging code in the site.pp file for their help.

TL;DR: Mapped Paths

The Missing Link

What I didn't realize and what Nick helped me understand was that if a '/' is in a variable that is used for Hiera, it is treated as a path specifier, meaning that "alpha/beta/gamma" will look in a folder "alpha" and then in a folder "beta" for gamma.yaml (with the paths mentioned in the parent post). For Foreman, this means that for the hostgroup path, you could always obtain the file at the end of the hostgroup, so "infrastructure_server/openstack" as a hostgroup would resolve to "nodes/host_group/infrastructure_server/openstack.yaml". This solves part of the problem, as we can now give definitions for the leaves of the tree, but doesn't inherit properly, which leads to the other missing piece.

Mapped Paths

Mapped Paths provide the other necessary component. As a refresher if you haven't seen them, they look like this in hiera.yaml:
- name: "Some Mapped Path Group"
  mapped_paths: [array_of_strings, tmp, "folder/%{tmp}.yaml"]
Puppet takes in three strings, the first which should resolve to an array when treated as a variable. The second string is used for interpolation in a path, which is what the third string is for. In the above case, for every entry in the array, a yaml file in "folder" would be searched for configuration info. So for the array ["alpha", "beta", "gamma"], we would expect these paths:
folder/alpha.yaml
folder/beta.yaml
folder/gamma.yaml
This sort-of solves the problem with the first possible solution I posted, as you don't have to have the massive filenames. But, you still end up with the other problem as described, that is, you have to have a bunch of files that are all on the same level and aren't sorted in a meaningful way.

Putting it all together...

After some misunderstandings about the right character to use in Hiera (%, not $) and scoping (top versus node level in site.pp), we came to a solution. By setting a variable before the 'node "default"' block of the site.pp file, we're able to set a top-scoped array that is used in the variable interpolation.
if $hostgroup {
        $tmp_groups = split($hostgroup, '/')
        $tmp_path_array = $tmp_groups.map |Integer $index, String $group| {
                $tmp_groups[0, $index + 1]
        }
        $tmp_final_path_array = $tmp_path_array.map |Array $paths| {
                join($paths, '/')
        }
}
$sorted_path_array = reverse($tmp_final_path_array)

This code generates an array of each level of the hierarchy. So to continue the earlier example of “infrastructure_server/openstack/controller”, we generate an array with the following values:

- "infrastructure_server/openstack/controller"
- "infrastructure_server/openstack"
- "infrastructure_server"

We reverse the array to ensure that it’s most to least specific to correspond with the rest of the Hiera hierarchy.

Final Thoughts

I understand that all of the above might be seen as too complex, and is likely to have some issues that I haven't foreseen (Dirk makes some good points in a response). However, based on the goals that we have, I think this solves the problem neatly and utilizes the best parts of each of the various components (Foreman, Puppet, Hiera, Git). I will get notifications on this issue, so would be happy for any feedback of any step or any other recommendations. Thank you to the Foreman project for their tireless work on an incredibly complex system.
4 Likes