Problem:
I used HashiCorp Packer and ks.cfg to build a gold image from CentOS 8/9 DVD iso. It has cloud-init baked and I followed the insturction from the doc (11.7. Using VMware cloud-init and userdata templates for provisioning) as well.
Once the vm has been converted to the vm template in vsphere. I go to Compute resource → Image → Link the VM template. Everything was fine and working great during the image-based provisioning.
However, what I noticed was my VM will reboot after 30 seconds when the VM is booted and cloud-init and userdata templates were being applied. I think the reboot broke the configuration process and the VM was not fully configured with all the snippets config applied.
I have no idea why it happened.
I checked the /var/log/message and I could not find any hints. my cloud-init template general summary:
set hostname
update time
add authorized keys
subscribe to foreman
install some packages from repo
configure sssd and realm to join the domain/ad authentication
install puppet-agent, update config file and then run puppet agent -t
But it always reboot in the last step where the VM has puppet-agent install but sometimes cannot update the puppet.conf file. it casued the VM failure to get the puppet classes.
I checked my snippet and there is no trigger for the reboot.
The error before the reboot is ‘failed unmounting /var’ but I think it is triggerred by the reboot as we have a partition on /var.
I have no idea why it reboots.
Expected outcome:
I expected that there is no reboot after the VM is provisoined via foreman with image-based option.
Foreman and Proxy versions:
Foreman 3.10 and Katello 4.12
Info:
Here is the cloudinit template supposed to be applied to the VM:
else
echo ‘The remote_execution_ssh_user does not exist and remote_execution_create_user is not set to true. remote_execution_ssh_keys snippet will not install keys’
fi
if [ $? ]; then
if ! grep -q ‘case_sensitive = false’ /etc/sssd/sssd.conf 2> /dev/null; then
sed -i ‘s/[domain/corp.abc.ca]/[domain/corp.abc.ca]\ncase_sensitive = false/’ /etc/sssd/sssd.conf 2> /dev/null
fi
i’m not in front of my instance but maybe i can help anyway
the UserData Template is using vmware tools and vmware to customize Hostname/IP/ ect
it’s rebooting after this step
After the vm boots again, it’s contacting foreman via cloud-init datasource and applies cloudInit Template
can you figure out, when exactly the vm reboots?
it should only reboot after UserData Tempalte
yep, he UserData Template uses vmware tools and the VM does get the IP, DNS and hostname set up correctly.
The VM contacts the foreman, gets the cloud-init template, and applies it itself.
The reboot is triggered when you see the cloud-init steps output on the screen ( close to the end when it installed the puppet agent successfully and then all of a sudden it reboots)
I assumed it would write the puppet.conf as well but it does not happen all the time. It looks like if it reboots earlier, the cloud-init does not have sufficient time to apply the config. And I have no idea why it reboots itself. is it related to the cloud-init? or is it related to Foreman? of is it related to the provisioning template?
The CloudInit Template is applied before the reboot, so theres the culprit
seems like i ran into the same issue as you
i had many issues setting this up also
i even created a new cloudInit Template with only the following
#cloud-config
hostname: <%= @host.name %>
fqdn: <%= @host %>
manage_etc_hosts: true
users: {}
# use runcmd to grow the VG, LV and root filesystem, as cloud-init
# doesn't handle LVM resizing natively
runcmd:
- cloud-init-per once grow_part growpart /dev/sda 3
- cloud-init-per once grow_lv lvextend -r -l +100%FREE /dev/mapper/vg_system-lv_root
- ssh-keygen -A
phone_home:
url: <%= foreman_url('built') %>
post: []
tries: 10
after that it’s contacting katello and applies all Ansible Roles which are attached to the Host Group ( after a delay )
Ansible is doing the registration and configuration
Did you also encounter the ‘out of nowhere’ reboot issue?
I am not sure if the Foreman developers notice this issue and have the solution to fix it.
if it is a common issue, is it possible to extend the time delay before it actually reboot…
it is so frustrating as it is the last part of my config in cloud-init template.
As every snippet can be reused and all configuration management can be done via puppet.
Yep. I checked out all the link you provided and this is the workaround solution for my situation and it worked well.
So basically I just add vmware-toolbox-cmd config set deployPkg wait-cloudinit-timeout 90 in my packer ks.cfg and now everything works perfectly.