RFC: Bare Metal Provisioning with M2 in Foreman

provisioning

#1

Introduction
Hello everyone, my name is Ian and I am working at Red Hat this summer as part of a collaboration with BU and Mass Open Cloud. I will be working to integrate one of Mass Open Cloud’s projects, M2, with Foreman in the form of a plugin. I’d like to use comments from the Foreman community to help better define my project’s designs.

What is M2?
M2 (Malleable Metal as a Service), previously known as BMI (Bare-Metal Imaging), is a system for quickly provisioning nodes in a multitenant environment. Fast provisioning speeds are attained by network-booting nodes from a Ceph image’s iSCSI target.

The project is currently based here: https://github.com/CCI-MOC/ims but will soon move to https://github.com/CCI-MOC/M2.

More information on M2 is available in this publication: https://arxiv.org/pdf/1801.00540.pdf

Plan: Create M2 plugin available for Foreman
M2 would become another option for bare metal provisioning. While stand-alone M2 sets up DHCP, TFTP, and manages the iPXE chain-boot process, the Foreman M2 plugin would let Foreman orchestrate those steps. The M2 plugin would be responsible for managing Ceph images and exposing the iSCSI target to Foreman. The work-flow would ideally be as close as possible to normal Foreman bare metal provisioning.

High-level strategy
A user would interact with M2 options in Foreman through menu items introduced by the M2 Foreman plugin. The Foreman plugin would then talk to an M2 smart proxy, which in turn would make requests to the M2 API server.

Potential interface: A new media option for the M2 storage device (just Ceph for now) would be available on the “Operating System” tab of the “Create Host” page. The user would choose an M2 image from a drop-down menu on the same tab.

M2 would likely introduce a new iPXE template for iSCSI booting. The iSCSI target would be filled into the template after Foreman receives it from M2. An example of the iPXE commands that M2 uses is as follows:

#!ipxe
      sleep 23
      set keep-san 1
      ifconf --configurator=dhcp net0
      sanboot --keep iscsi:10.10.10.111:tcp:3260:1:${target_name}
      boot

Conclusion
I appreciate any feedback and I’m happy to answer any questions the community might have.


#2

Hello Ian and welcome to your community.

This is indeed interesting project, I am printing the paper as I write and will be reading this in the evening, hopefully my conclusions are correct for now. I am are aware of Ironic and I have a pet-project myself which aims to solve the MaaS problem with a similar approach. The idea is the same - to transfer raw image over a network, but the transfer mechanism is different - via udpcast. A small utility which I recently added into EPEL7 which can be used for multicast file transfer - one sender, multiple receivers.

I have currently a PR in discovery node repo which is a small RHEL7 which runs from memory and can be PXE booted, iPXEbooted or booted from removable device. The idea is simple - navigate TUI to screen which asks few questions and spawns udpreceiver utility. On the other end, someone initiates udpsend with an image or via a cron job. This is currently ad-hoc solution with no automation, but we plan to add remove execution via ssh for discovered nodes and this could be easily automated.

Now, this is nice pet project, but enterprise networks are very tightened and UDP might not be an option. This is where M2 can help, iSCSI is more reliable protocol which is enterprise-grade and battle tested. We can work together in sharing some code - the most interesting would be a place for images which could be reused at multiple places.

I would like this to be broken into several plugins from the very beginning. Particularly the Image store component is something that is worth having as a separate plugin. If there is a common API, we could have M2 implemenation from the very beginning with possibility to use Rails ActiveStorage as a poors-man solution or even implementation for Pulp image store when that happens. Managing images is something that is very relevant, we have Compute Resources for various virtualization and cloud projects and products.

Smart Proxy module/plugin is a must, we don’t want to let Foreman core talking to external service directly for new feature. That’s a good design. Can you elaborate more on the API endpoint? What kind of HTTP requests would be exposed? Can we also have a common API for multiple implementations?

I need to read more about M2, but from what I understand it uses either “pull” approach when a client SAN boots Linux which (I believe) fetches the target image and writes it to the harddrive or “push” mode when a node exposes its hdd via iSCSI and M2 pushed the content into it. Would be great if the common API offered both.

Haven’t read the paper yet, is the “storage device” basically the image which is about to be deployed to a host? We already have a model called Image:

  create_table "images", force: :cascade do |t|
    t.integer "operatingsystem_id"
    t.integer "compute_resource_id"
    t.integer "architecture_id"
    t.string "uuid", limit: 255
    t.string "username", limit: 255
    t.string "name", limit: 255
    t.datetime "created_at"
    t.datetime "updated_at"
    t.string "iam_role", limit: 255
    t.boolean "user_data", default: false
    t.string "password", limit: 255
    t.index ["name", "compute_resource_id", "operatingsystem_id"], name: "image_name_index", unique: true
    t.index ["uuid", "compute_resource_id"], name: "image_uuid_index", unique: true
  end

As you can see, it is associated with OS, Compute Resource and Architecture. We could build on top of that to deliver common UX for users. If we do this right, Foreman could use the “image store” to fetch images and upload them onto hypervisors at the end of the day. If you design this with all of this in mind, we could even have the image store as Foreman core feature so other plugins could build on top of this.

Building on top of our current image model means that we could take our current orchestration bits and apply them to M2-provisioned hosts. Perhaps seeding it via cloud-init or ignition or sshing onto a host and executing “finish” script are both good options.

I suggest to do some model change proposal, going forward I usually do prototypes with only API endpoints do I don’t need to do UI part (which is for me challenging) and once I have the feature working end-to-end then I focus on UI part. But that’s just me.

Since booting will be under Foreman control, we don’t need to limit users only to iPXE. Is this the only option in M2? We support PXELinux, SYSLINUX, Grub and Grub2, not sure if any of these mentioned can do SAN booting at all. We can also build on top of Foreman Discovery, it’s a liveCD RHEL7 which contains probably all the required utilities including iSCSI client.

Just a sidenote, iPXE is known to have problems on specific hardware platforms. It does not contain drivers for all possible network cards. It would be great to offer alternative if that’s technically possible.

Links:


#3

Just a couple of things from me based on what I’ve learned so far about the M2 project:

When M2 provisions a host, it tells ceph to clone it (it’s super fast, since it’s a shallow copy-on-write clone) and then M2 exposes that as an iSCSI target. The host never writes anything to disk. Provisioning is fast: basically as fast as the system’s boot takes + a couple of seconds for Ceph to make the clone.

The project’s origins are to support “elastic” bare metal, where the size of your bare metal deployment expand or contract based on available resources, which is why the host never writes the image to disk. You might only get a bare metal host from midnight to 6am every day. At 6am the host turns over to another workload, but the next night at midnight you get the metal back and can boot off the image you had before. This is the “cattle” use case and maybe part of the eventual M2 integration with Ironic.

Of course Foreman is primarily long-running “pets” but the iSCSI provisioning case is still useful in some cases.

My concern about this is that compute resource is heavily coupled with the concept of virtual machines, I’m not sure how much work it’d be to decouple that, but it’s certainly a possibility. @lzap, what would you envision the M2 process being, does a user select “Bare Metal” from the “Deploy On” menu, or would M2 be a compute resource of it’s own, with a list of images like you would get from a vmware or oVirt resource?

+1, M2 uses iPXE today which is probably a good place to start for a PoC, unless we could easily swap in grub2 or something. I think @iballou looked into grub2 and it didn’t have any obvious sanboot functionality.


#4

Yeah, after reading the paper it’s all clear. I made wrong assumption that M2 is another imaging tool. It’s a diskless approach via iPXE-iSCSI-Ceph, it’s good to have open-source solution with common API for that. Everytime I worked with a diskless setup, it was always custom made with many quirks and workarounds. Mostly workstations, as the paper says.

We recently added support for Anaconda image-based provisioning for oVirt/RHEV, it’s just a template but I was not aware that Anaconda can do that (grab an image, write to disk, run post scripts, done). Long term, it would make lot of sense to have a common imaging API.

It’s indeed coupled to Compute Resource, but not much. An image has OS installed and architecture, both associations make sense. Username, password and userdata are all relevant too - this allows Foreman to customize an image. What’s not relevant is IMHO: uuid, iam_role and compute resource association - this could be extracted into M:N association easily tho.

However, after understanding what M2 really is, I think it makes more sense to have simply another Compute Resource. I think M2 is good fit to be a regular CR, there are images which can be associated, servers (not VMs) can be cloned, started, stopped and images have credentials. User-data is not relevant although technically possible, Foreman can use finish scripts to inform itself about end of provisioning process.

I think we can use Compute Resources just fine, but the name of the CR should indicate this is also bare-metal. I’d suggest something like “M2 Bare Metal”, so this name shows up in the “Deploy on” drop down next to OpenStack or Bare Metal. By the way, we should replace this term with something more correct like “PXE Host” because you select “Bare Metal” obviously when you want to PXE boot a VM via Foreman. That’s not correct, let’s discuss separately: RFC: Rename Bare Metal in Deploy on menu

Scratch my ideas around generic imaging, that’s not on the table now.

Going forward, our plan is to focus on UEFI HTTPS Boot capability as an alternative to iPXE which is used by Foreman users to avoid TFTP in PXE. This can enable more hardware to use iPXE in UEFI mode.


#5

Hi @lzap, thank you for your in-depth response. The number of potential MaaS solutions out there does amaze me, which is why I think it’s a really exciting field to work in.

The specific API endpoints that will be exposed are still being decided on. However, there are some that will definitely need to exist. First, image-related API calls, such as: viewing available images and snapshots, making snapshots, importing and exporting images to/from M2 and Ceph, and uploading and downloading images from their computer (an M2 feature in-development). Then there are the provisioning related calls. The M2 “pro” (provision) call will get split up since its iPXE related functionality won’t be needed for Foreman. The result then might be an API call that takes an image name and returns an iSCSI endpoint. For reference, here is the current list of commands that the M2 CLI knows:

  cp        Copy an existing image not clones
  db        DB Related Commands
  download  Download Image from BMI
  dpro      Deprovision a node
  export    Export a BMI image to ceph
  import    Import an Image or Snapshot into BMI
  iscsi     ISCSI Related Commands
  ls        List Images Stored
  mv        Move Image From Project to Another
  node      Node Related Commands
  pro       Provision a Node
  project   Project Related Commands
  rm        Remove an Image
  showpro   Lists Provisioned Nodes
  snap      Snapshot Related Commands
  upload    Upload Image to BMI

I am still learning about the Foreman work flow, but my first assumption was that M2 could be a compute resource. M2 is a server that keeps track of its images and provisions hosts, which seems to fit well with the compute resource model. Also I definitely agree on having the name indicate bare metal.

I will talk with the M2 team about using an alternative to iPXE. I think UEFI HTTPS booting would be a good alternative to start investigating, especially if Foreman will be our first big user. On the topic of new features to investigate, we also plan on supporting storage services other than Ceph in the future, such as ZFS or perhaps NetApp.

We are deciding at the moment if Foreman could fit all of our M2 use cases. One big challenge is the reprovisioning case in an elastic multitenant environment, where a physical server might belong to a Foreman instance at one time, but hours later could belong to someone else. This is where HIL comes in, since it is responsible for managing the network channels that the nodes connect to. HIL would power off the node, change the switch configuration to put the node on another user’s network, turn it back on, and provide an interface for viewing the newly acquired node. More information on HIL is available here: https://github.com/CCI-MOC/hil/

Elastic bare metal nodes would be new to the Foreman work flow (to my knowledge). We discussed the idea of having each tenant own their own Foreman, so if their server isn’t available to them, it would just appear to be off, and when it’s returned to them, it would appear as if the server was never gone. This is just a glimpse into what we are planning. For now, the plan is to focus on creating solid M2 plugins that give users the spectrum of M2 functionality.


#6

Our current ComputeResources usually implements start, stop, list images, list projects/tenants/flavors/zones/networks. I suggest to take a look on OpenStack or similar cloud provider:

As you can see, our CRs are tightly coupled with Foreman core and also with Fog Ruby library which provides connection. I don’t like this coupling very much and it looks like you would prefer creating smart proxy module with HTTP API to access M2. I like this, we have several RFEs to support connecting via smart-proxy because direct connection is not always available.

I think it is worth exploring an idea of creating generic Compute Resource that would only connect to well designed smart-proxy API. The same Compute Resource code could be reused in the future for other/new smart proxy “connectors” implemented remotely via HTTP API. We already have a decent module/plugin/provider system in smart proxy which enables us to write loosly coupled components and we could build on top of that. One advantage of this approach is that smart proxies has “auto-discovery” feature, once they are registered in Foreman they pass list of “available modules” (we call them really “features”) so new Compute Resources would automatically appear.

I’d recommend to specify minimum set of integration calls required for successful provisioning but to include multi-tenancy from the very beginning. This is an important aspect downstream and for the enterprise. Foreman-Smart Proxy connection does not carry credentials and have global authorization AFAIR (client HTTPS certificate) so the API would need to define this. A simple mapping between Foreman Organization and User/Group would do the job.

On another thought, you can have M2-managed VMs as well, maybe just M2 would do the job. I agree that it makes most sense with bare-metal.

We can do a community call where I can provide more details and answer questions if you want.

Foreman today does not manage network on lower level (ports, VLANs) or even in cloud/virt providers, but we have plans on redesigning network model in our DB in a more flexible way so we can start building towards something like that. There is a demand for this in our community and RFC is currently being discussed

We have really deep multi-tenancy in Foreman, it’s possible to have multiple Foremans but rather than that I’d like to work towards fully multi-tenant instance. If M2 can manage isolated networks, than it’s just matter of provisioning smart-proxy on each DHCP subnet and connecting it to Foreman under correct Organization.


#7

Extending this plugin might also be of interest, since it’s all about imaging… /cc @x9c4


#8

I hoping that I could have my M2 compute resource talk directly with the M2 smart proxy. For example, to grab the images available to M2. I’ve created a basic ProxyAPI in my M2 plugin that I was originally planning on having the M2 compute resource use. Is Fog unavoidable unless changes are made to Foreman core?

Including multi-tenancy from the beginning to us would mean also integrating the HIL project that I mentioned before. M2 alone cannot manage isolated networks, which is why the two projects are so closely related. We are currently working on decoupling their dependencies, such as the fact that HIL handles M2’s authentication. We did not originally plan to include HIL in the first POC for the integration, but I think it could be interesting to have a HIL plugin for Foreman as well. HIL’s role in this integration is definitely a key point that will need to be decided soon. I’ll need more input before I can say for sure how HIL will fit in.

Thanks, I will check with other members of Mass Open Cloud to see if they would like that.


#9

This sounds good.

Good question to which I don’t have answer. Changes might be required, but if this turns out to be big task, you can still write “fog-m2” provider which will provide what’s needed. No proxy would be involved in this case, maybe someday we will do fog-fog plugin which will remotely connect to fog “proxy” to achieve this requested feature.


#10

If I do hit a wall with avoiding Fog I’ll consider making a Fog provider for M2. Alternatively, I might try to change the core compute resource to make the VM coupling optional. I should know in a week or so if I need to rethink my smart proxy strategy.