two weeks ago, I update from Foreman 1.15.4 to Foreman 1.16. A few days later, a colleague noticed that in freshly built CentOS VMware VMs, the reproducible interface names changed, the frist one being ens160 instead of previously ens192.
The following investigation showed that the VM was equipped with an emulated LSILogic SCSI adapters instead of the VMware paravirtual adapter that is mentioned in the Compute Profile an explicitly asked for in the API call to foreman. This is reproducible now.
Since we don’t deploy that many VMs, it is possible that this behavior changed with the Foreman update. While we have some evidence that it’s actually VMware being at fault (since VMs built by the same Foreman instance on a lab VCenter get the corret SCSI adapter), I’d like to know whether other users might have the same issue in Foreman 1.16.
Any ideas how do debug this? I think it would be good to see exactly what foreman (via fog (via fog-vsphere)) is actually telling the Vcenter what to provision, but the communiation is encrypted and cannot be intercepted with tcpdump. Any idea to obtain this information?
did you recognized the new capability to add multiple SCSI controllers and select the SCSI type for each controller? Maybe change it to something else (like Bus Logic), save it and then change it back to Paravirtual.
first things first, VMWare is known to have problems in RHEL (and Linux in general) due to buggy driver implementations. A workaround was pushed with RHEL 7.3. I will give you a link and this is end of my rant
Now, while I have no idea what is going on there, Compute Profiles are sometimes buggy. I suggest to rule them out - create a host without hostgroup or profile, click on everything manually to make sure all is set.
To debug VMWare communication, we have a hack. Try that to see the communication there:
thanks for helping. I really appreciate the time you’re taking.
The site in question is running about 2000 VMs with Scientific Linux 6 and CentOS 7 without any driver issues at all.
I am aware of this issue and we solved it locally by backporting the newer systemd version to CentOS 7.2. It has been solved since then, and the current issue is experienced with CentOS 7.4.
We are actually building via the Foreman API, and I can confirm that our client application explicitly sends “scsi_controller_type”=>“ParaVirtualSCSIController” to the API. In one of customer’s four Vcenter installations (a 6.5 instance), the VMs get configured correctly, in the other three ones (twice 6.0, once 6.5) not.
I put the file in question into the respective directory, put a deliberate syntax error in, and saw foreman crashing on start. Thus being sure that the file is actually read, I fixed the syntax error and started foreman again. The building error stays, but I have been unable to locate any extra log entries generated by foreman. Are those going into production log? How are these logs supposed to look like? Where do I find the debug information that the patch from the FAQ is supposed to create?
After fixing the deliberate syntax error, there was no error any more and no stacktrace. The building error that I referred to in the quoted part is the VM having the “wrong” SCSI controller in the “wrong” virtual PCIe slot. I just didn’t find the logs that the 00_rest_client.rb initializer was supposed to generate anywhere (and still don’t know where they’re supposed to be written).
In parallel, I have obtained the desired dump of information from a tcpdump running in the middle of two socat instances that I funneled the SSL’ed control flow between Foreman and the vCenter through. It turned out that it was actually Foreman asking for the LSILogic SCSI controller instead of the ParaVirtualSCSIController that was explicitly mentioned in the compute attributes.
The final culprit was an omission in the site local host generation script using the Foreman API v2: While the input data had (correct) values for both compute resource and compute profile, the script completely ignored the compute profile data while explicitly asking for the compute resource and a ParaVirtualSCSIController. In absence of an explicitly set compute profile, Foreman takes the one set in the host group (or inherited from the parent host group), using a rather strange set of default values (including an LSILogic Controller) if the compute profile in the host group is undefined for the compute resource asked for in the API input.
After the site local host generation script was fixed to actually hand down even the compute profile name to Foreman, the behavior stopped. Foreman is not to blame here.
Can the set of default values to be used for VM generation in case of a compute profile not being defined for a compute resource be configured via the front-end, or is that hard-coded?
Currently, the issue seems to be solved for us with nothing to be done neither for Foreman nor for VMware. Thanks for listening.