Kickstart: disk device selection (vda vs. sda, or more than one disk)

cleka · June 3, 2020, 6:00pm

((Version info: I am using foreman 2.0, katello 3.15, CentOS 7))

sda vs. vda

Situation: Default kickstart templates for Centos/RHEL by default want to install to /dev/sda.

But when installing virtual machines in a libvirt (qemu/KVM) hypervisor, my default disk type is virtio, and those disks are inside presented as /dev/vda.

Kickstart will then fail with a message like “not enough space available”, and one has to look several lines back to notice that the problem is that it can’t find “sda” at all.
(This problem is not foreman/katello specific, but I describe how I configure foreman to handle it).

Since the majority of things I am installing, will be VMs, I modified the default to be vda:

Hosts => Partition tables => “Kickstart custom” => right side, “Clone”, to make a new (modifiable one) as “Kickstart custom clemens”. In the code snippet, around line 13, changed

dev = host_param(‘part_device’) || ‘sda’
to
dev = host_param(‘part_device’) || ‘vda’
.

For the few hosts where I would want to use sda, I create a host specific parameter.

Hosts => Create host => after you have filled “Host” and “Operating system” and “Interfaces”, under Parameters create a “Host Parameter”

Name: part_device
Type: String
Value: sda

Actually for my physical host I don’t use exactly that, because for them I need to do a special trick, because… read on.

More than one disk device - which to use?

My playground hypervisors have two LSI Raid devices, one faster RAID1 136 GB and another RAID5 a bit slower 600 GB or 1.2 TB. Naturally I want to install the OS to the smaller, faster disk.
But which will appear as sda and which as sdb seems to be random (I have three such servers).

One trick I have seen elsewhere, is that one can use as argument for the disk device (what normally is “sda” also things like /dev/disk/by-uuid/… etc. For example:

[root@fuji3 ~]# ll -d /dev/disk/by-*
drwxr-xr-x 2 root root 480 May 21 10:31 /dev/disk/by-id
drwxr-xr-x 2 root root  80 May 21 10:31 /dev/disk/by-label
drwxr-xr-x 2 root root  80 May 21 10:31 /dev/disk/by-partuuid
drwxr-xr-x 2 root root 200 May 21 10:31 /dev/disk/by-path
drwxr-xr-x 2 root root 100 May 21 10:31 /dev/disk/by-uuid
[root@fuji3 ~]# ll -d /dev/disk/by-path
drwxr-xr-x 2 root root 200 May 21 10:31 /dev/disk/by-path
[root@fuji3 ~]# ll /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 May 21 10:31 pci-0000:00:1f.5-ata-2.0 -> ../../sr0
lrwxrwxrwx 1 root root  9 May 21 10:31 pci-0000:02:00.0-scsi-0:1:0:0 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 21 10:31 pci-0000:02:00.0-scsi-0:1:0:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 21 10:31 pci-0000:02:00.0-scsi-0:1:0:0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 May 21 10:31 pci-0000:02:00.0-scsi-0:1:0:0-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  9 May 21 10:31 pci-0000:02:00.0-scsi-0:1:2:0 -> ../../sda
lrwxrwxrwx 1 root root 10 May 26 18:55 pci-0000:02:00.0-scsi-0:1:2:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 26 18:55 pci-0000:02:00.0-scsi-0:1:2:0-part2 -> ../../sda2
[root@fuji3 ~]# 
[root@fuji3 ~]# lsblk /dev/sda
NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda             8:0    0   557G  0 disk 
├─sda1          8:1    0   300G  0 part 
│ └─data-data 253:1    0   300G  0 lvm  /data
└─sda2          8:2    0 256.9G  0 part 
[root@fuji3 ~]# lsblk /dev/sdb
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdb              8:16   0   136G  0 disk 
├─sdb1           8:17   0   500M  0 part /boot
├─sdb2           8:18   0    35G  0 part 
│ └─fuji3-root 253:0    0    25G  0 lvm  /
└─sdb3           8:19   0 100.5G  0 part 
[root@fuji3 ~]#

So in this case sdb is the smaller disk, thus:

lrwxrwxrwx 1 root root 9 May 21 10:31 pci-0000:02:00.0-scsi-0:1:0:0 -> ../../sdb

and I can specify it in kickstart, i.e. in foreman/katello as Host parameter:

Name: part_device
Type: string
Value: /dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0

And indeed this way, installing that bare metal with PXE and kickstart, it did nicely install my OS to the small disk.

lzap · June 5, 2020, 7:34am

Well the idea here was you create a parameter named part_device for your host or hostgroup and set it accordingly. You know, the thing is that in pure kickstart the installer cannot detect the primary device. Even if it could, which one do you want to use for OS? Is that vda, vdb or something else? Thus we default to some sane default.

There is a trick how to run shell in %pre section and do parititioning manually, then you can use bash and do whatever you want. Here is a WIP patch that implements that: Fixes #28521 - log location, pre/post, dynamic layout by lzap · Pull Request #672 · theforeman/community-templates · GitHub

Yeah nice trick, create a hostgroup with this parameter and you can nest these as well to apply it to other hostgroups.

cleka · June 5, 2020, 10:10am

Well the idea here was you create a parameter named part_device for your host or hostgroup and set it accordingly.

Exactly, that’s what I did - in a way.

In my case, I changed “the overall default” to vda in the clone of the template, because that’s what most of my systems will be. I am not using the hostgroups thing properly yet (for a reason…).

Yes, I have seen the %pre section trick to “dymanically decide” somewhere. So that would check which disks are there and which is the big one and use that. Felt a bit too much overkill for my three hosts. Fiddling with that is quite time consuming… changing template/snippet, and make a test run (and physical machines boots are very slow, as we know).

create a hostgroup with this parameter

In my case I make this a host specific parameter, because the scsi ids might be different from host to host - depending on in which order the RAIDs were created.

[clemens@f31clemens ~]$ ssh fuji1
Last login: Fri Jun  5 12:52:02 2020 from 192.168.1.148
[root@fuji1 ~]# ll /dev/disk/by-path | grep -E 'sd[ab]$'
lrwxrwxrwx. 1 root root  9 Jun  2 20:50 pci-0000:02:00.0-scsi-0:1:0:0 -> ../../sdb
lrwxrwxrwx. 1 root root  9 Jun  2 20:50 pci-0000:02:00.0-scsi-0:1:4:0 -> ../../sda
[root@fuji1 ~]# 
[root@fuji1 ~]# logout
Connection to fuji1 closed.
[clemens@f31clemens ~]$ ssh fuji2-new
clemens@fuji2-new's password: 

[clemens@f31clemens ~]$ ssh root@fuji2-new
Last login: Fri Jun  5 09:52:20 2020 from 192.168.1.148
[root@fuji2-new ~]# ll /dev/disk/by-path | grep -E 'sd[ab]$'
lrwxrwxrwx. 1 root root  9 Jun  3 12:02 pci-0000:02:00.0-scsi-0:1:0:0 -> ../../sdb
lrwxrwxrwx. 1 root root  9 Jun  3 12:02 pci-0000:02:00.0-scsi-0:1:2:0 -> ../../sda
[root@fuji2-new ~]# 
[root@fuji2-new ~]# logout
Connection to fuji2-new closed.
[clemens@f31clemens ~]$ ssh root@fuji3
Last login: Fri Jun  5 11:17:03 2020 from 192.168.1.148
[root@fuji3 ~]# 
[root@fuji3 ~]# ll /dev/disk/by-path | grep -E 'sd[ab]$'
lrwxrwxrwx 1 root root  9 Jun  5 11:05 pci-0000:02:00.0-scsi-0:1:0:0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jun  5 11:05 pci-0000:02:00.0-scsi-0:1:2:0 -> ../../sda
[root@fuji3 ~]#

So two are the same, third is different. If disks would get broken and need to rebuild whole RAID, SCSI might again be different.
(in my other server, the RAID1 got totally broken. Due to a powerfailure, it had dropped one of the two disks from the RAID1, and was running for some months only on one disk, and I had no monitoring informing me. Eventually I got read errors from that RAID, and it turned out both disks were reporting already “Failure predicted” (S.M.A.R.T. thingy). (2 x cheap iNoris 600 GB SAS which I had ordered 2014 or so from Amazon)).

All VMs that were in that datastore were somewhat corrupted :-/

Since then I have installed a Zabbix, but it’s not monitoring much yet
So, that’s one more reason to take more katello into use - configuring in each server which DNS to use, which local mail server to use, configuring the Zabbix agent, local users (me, my wife), …, disable ipv6, change nic to net0 (udev rules). (( I have just some habits how I am used to how servers should look like, and I rather make my VMs match that, than experimenting with different “new ways”)).

But anyway - yes. The way how one could configure/specify which disk to use with global default, hostgroup or host specific, is “just fine” - once one knows it.

So that’s what I meant, katello is done quite well, and “support for libvirt” in this context is good (where one of you said, “our support for libvirt is not that great”. In the scope I use it, it does the job perfectly.

For a even more professional use case, like a company, indeed one should probably go for oVirt or VMWare.

All in all: well done, folks. I think you are doing a great job, and how helpful and responsive you were here for my “complaints” / questions is great!

cleka · June 5, 2020, 10:13am

But yes: reading this all again, as advice for anybody in future who stumbles over this:

One should rather do the hostgroup thing, instead of hacking/hardcoding the template fragment. (This “vda as default” is just my personal use case, not recommended to imitate.)

lzap · June 5, 2020, 11:52am

Cool. Are there any things you would explain differently in our docs?

https://docs.theforeman.org/web/

Feel free to edit those:

cleka · June 5, 2020, 1:21pm

Are there any things you would explain differently in our docs?

Don’t know yet, would have to read them all first

If I would have read all of it, perhaps I wouldn’t have run into some problems as I did.

But that’s a generic problem with documentation - one wants a quick start (nobody reads hours and hours before even trying anything), but going with some quick start (like I did with some youtube videos), one might do things initially the wrong way ;-/

Just finished reading the “Installing Foreman on RHEL/CentOS”. Looks quite good, quite comprehensive. Some of it I probably now afterwards understand better, compared to would I have read all of it up-front…

So, no, for now I haven’t noticed things that could/should be explained differently. I keep it in mind for future readings.
Probably I should go through the provisioning guide, and do the “how to make foreman talk with my existing DNS and DHCP server”. (I don’t want them to run on the foreman, because: DNS and DHCP server have to be running all the time, but foreman I will probably have shutdown when I don’t use it for longer periods. Those blade servers slurp quite some electricity - so I have only one hypervisor with the existential VMs always on (an old IBM System X), and the three Fujitsu blades only when creating something new, or perhaps as next thing to run some small kubernetes cluster, etc. Once I find a job again, I won’t be able to spend much time with this, so would be totally wasted to have them run all the time. That was basically also the reason why the Fujitsu blade was configured for “Minimum Watt usage” instead of “Maximum performance”).

lzap · June 5, 2020, 1:44pm

Thanks for the feedback!

This is something we need to improve in NG docs, @mcorr @spetrosi! We have quite good Quick Start section in our docs. Just do these commands to get you started. As much as I would like our users to read all our Memory Requirements or Deployment Strategy Gude, the reality is different.

mcorr · June 5, 2020, 2:14pm

In the Foreman docs or the downstream docs? The Satellite QSG isn’t so great IMO.

We have wanted to build a realistic QSG for some time, and if we had a high level set of tasks, we could add it to the list of things to do.

lzap · June 8, 2020, 9:11am

Here. Foreman :: Manual

A good starting content would be similar simplified instructions about how to get Foreman with Katello installed and then bunch of links to various starting point chapters (deployment guide, provisioning guide, puppet guide, installation guide).