RFE : don't break libvirt + Ceph image provisioning

Hi,

(disclaimer : sort of complete ruby noob ahead :crazy_face: )

I’ve spent a few hours to get foreman (1.18) to kickstart a VM using image based provisioning on libvirt + ceph.

I know ceph isn’t quite supported using foreman because of fog probably, but as you know, small workaround exist to allow creating volumes inside such pools.

As a summary, to get image based provisioning to work, I faced a few issues - some of which are out of foreman scope I presume :

  • 1st : it appears that when foreman (fog) creates the ISOuser data file, it is uploaded to the 1st pool available in libvirt. Or at least in 1.18. Looks like 1st means 1st in alphabetical order. And since for me that was a ceph pool, libvirt “volUpload” failed because that is not supported by rbd pools. Even if I chose to use an NFS pool for the VM and the backing image…

=> workaround : either hack into fog to force the pool where the ISO is uploaded (which I did for now), or probably better, just make sure the “1st” pool that’s available is an NFS or posix pool (which I’ll try later).
=> I fear that’s sort of out of scope for foreman, but poking around foreman code, I also saw that some fog things are bypassed so maybe you’ll know how to get around…

  • 2nd issue, and that one is foreman related : for image based provisioning (using qcow2), the foreman code just “vol.save” the volume with a backing_volume attribute pointing to the base image, see here : vol.save line

The issue here is that this actually also works on rbd pools, but just creates an empty volume which is unrelated to the backing volume. it took me quite a number of retries to find out that for RBD, it is clone_volume which works - but I also found out that this then completely duplicates volumes on NFS pools… so I ended up with this code which is implementing that only for rbd, leaving the code as is for other pool types :

      if vol.backing_volume.present?
        pool = storage_pools.select { |p| p.name == vol.backing_volume.pool_name}.first
        # there is at least the volume we are cloning in that pool, look at it :
        pool_vol = pool.volumes.first
        pool_xml = client.client.lookup_storage_pool_by_name(pool.name).xml_desc
        if pool_xml.match /<pool type='rbd'>/
          vol.backing_volume.clone_volume(vol.name)
          vol.path = vol.backing_volume.path.gsub(vol.backing_volume.name,vol.name)
        else
          vol.save
        end
      end

I tried to use fog’s .xml or .xml_desc pool attributes, but they were either empty or unavailable so I resorted to using the libvirt connection instead.

Would you be kind enough to include this fix/enhancement/rfe into the libvirt compute model ?

Thanks

Hello,

can you first start with describing what is the target workflow. You say image-based provisioning on Ceph, then you mention ISO files (like OS installer ISO?) I am confused.

A bug we need to sort out either in Foreman or in Fog, not sure yet where it belongs.

We need probably to extend the model and create “backing” boolean flag. Or maybe a flag called “clone volume” could do it.

We are very open to improvements, this must be more polished and filed as a regular pull request so I can review and test this

Hi Lukáš,

The workflow really is “image based provisioning”, but I forgot to say this is in a no-cloud environment since I’m using libvirt as compute resources in foreman.

Basically : uploading a CentOS “cloud” image in my NFS pool or my Ceph pool, declaring that image in the libvirt compute resource in foreman, then creating a host in foreman on libvirt, clicking on “image based” for the provisionning method, selecting the base image, then finishing host creation.

Foreman creates the VM image with its backing image automatically : if using qcow2, then the new VM image initially is almost empty, and that’s already done auto-magically in foreman/fog when using NFS storage pools (or local pools).

The clone/backing feature already is present… no need to add flags ?

Using ceph as the pool where we create the VM, it does not seem possible to use qcow2 and by default, created images will be empty ones even if we choose a backing image, but it’s still possible to have copy on write on raw images, which is what I’m achieving with this change - but the edits were limited/not very clean because of the fog-libvirt gem which seems quite old and did not allow for getting the pool xml directly.

For libvirt “image provisioning”, (foreman+) fog automaticaly generates an iso file containing the user-data and meta-data files using the “user data template” specified in foreman : that iso file is automatically uploaded to libvirt (1st pool found :tired_face: ) and attached to the VM to enable it to boot using cloudinit (cloudinit will use the NoCloud datasource ). Cloudinit will then use that iso image for the VM customization at 1st boot.

Concerning the fog “bug”: it is actually possible to specify the pool : I hacked it in here :

/opt/theforeman/tfm/root/usr/share/gems/gems/fog-libvirt-0.4.1/lib/fog/libvirt/models/compute/server.rb
=>
I hardcoded/added the pool_name param for testing purposes like this on line 247:

 vol = service.volumes.create(:name => cloud_init_volume_name, :capacity => "#{File.size(iso)}b", :allocation => "0G", :pool_name => 'my_pool_for_cloudinit') 

Finally : I’m OK for creating a PR in github, but my ruby skills are limited, and my rails ones unfortunately are non-existent, so I fear I would be unable to do more than patch things : I wouldn’t be able to enhance the model since I don’t even know what a model is in rails terms :pensive:

Hopefully I can still help get this up and working in foreman anyway :smile:

Best regards
Frederic

update : further testing…

I added raise exceptions in fog to try to understand where it would create the (in)famous ISO images…
I thought it was in the 1st pool in alphabetical order : that assumption was wrong…

From what I can see, the fog call :

 vol = service.volumes.create(...)

is using the pool from the 1st volume in service.volumes … and it appears that this list is “consistent” until libvirt is restarted :

  • volumes are listed by pools
  • always in the same order (by pool)

BUT when the libvirt daemon is restarted :

  • pools are not in the same order (internally ? some hash somewhere ?),
  • therefore : service.volumes are not listed in the same order
  • 1st volume might NOT be on same pool as before
  • and then : ISO upload might fail depending on the pool support for volume upload (failing on rbd at least)

Looks like this actually makes libvirt image provisioning quite unreliable, unless (hard)coding the pool name as I did - or modifying the model to select that damn pool where the ISO will be uploaded, but I don’t know how to do that and even if I knew, I don’t think this could work without modifying the (old) fog library as I did …

:disappointed_relieved:

Unfortunately libvirt is mostly for non-production purposes for our community users, if you need rock solid virtualization on KVM try oVirt/RHEV. Our integration is more advanced for this compute resource.

Upgrading fog is always an issue, create ticket for us if you think this will help. We don’t upgrade often as this affects all CRs and it’s bunch of work to redo all our monkey patches etc.