Cluster installation CentOS 7.4 network problems

Hi there,

after using Foreman successful on our clusters for more than a year. I'd like to reinstall a 90 node cluster with Centos 7.4. It's now running on Centos 7.3 . I'm not able to just update to 7.4 because of zfsonlinux dependencies and well - some nodes died and had to bare metal install them.

So I was able to install these nodes successfully by pxe-booting and using a regular CentOS mirror. After the final reboot the nodes g ot no network connection at all and puppet wasn't able to pull of course. After logging in locally and restart NetworkManager the connection came up - sometimes on the first try sometimes on the second try. I never discovered such behavior with Centos 7.3 or 7.2.

Network properties:

DHCP, MTU 9000

DHCP-Server not Foreman managed, on different network

TFTP-Server Foreman managed, on different network

I've read one thread on stackexchange which describes a simular problem using a kickstart installation and dhcp network configuration on Centos 7.4



Does any body of you discovered similar problems?

This is what my provisioning template / kickstart template looks like:

install
url --url http://mirror.centos.org/centos/7.4.1708/os/x86_64 --proxy=http://proxy.uni-leipzig.de:3128 lang en_US.UTF-8
selinux --enforcing
keyboard de
skipx

network --bootproto dhcp --hostname galaxy110.sc.uni-leipzig.de --device=somemacaddress rootpw --iscrypted foo
firewall --service=ssh
authconfig --useshadow --passalgo=SHA256 --kickstart
timezone --utc Europe/Berlin
services --disabled gpm,sendmail,cups,pcmcia,isdn,rawdevices,hpoj,bluetooth,openibd,avahi-daemon,avahi-dnsconfd,hidd,hplip,pcscd

bootloader --location=mbr --append="nofb quiet splash=quiet"

zerombr
clearpart --initlabel --all
ignoredisk --only-use=sda
part biosboot --size 1 --fstype=biosboot --asprimary
part / --fstype=xfs --size=20480 --asprimary --ondisk=sda
part swap --size=131072 --ondisk=sda
part /var/log --fstype=xfs --size=10240 --ondisk=sda
part /home --fstype=xfs --size=10240 --grow --ondisk=sda

text
reboot

%packages
yum
dhclient
ntp
wget
@Core
redhat-lsb-core
%end

%post --nochroot
exec < /dev/tty3 > /dev/tty3
#changing to VT 3 so that we can see whats going on....
/usr/bin/chvt 3
(
cp -va /etc/resolv.conf /mnt/sysimage/etc/resolv.conf
/usr/bin/chvt 1
) 2>&1 | tee /mnt/sysimage/root/install.postnochroot.log
%end
%post
logger "Starting anaconda galaxy110.sc.uni-leipzig.de postinstall" exec < /dev/tty3 > /dev/tty3
#changing to VT 3 so that we can see whats going on....
/usr/bin/chvt 3
(

#update local time
echo "updating system time"
/usr/sbin/ntpdate -sub 139.18.1.2
/usr/sbin/hwclock --systohc

# Yum proxy
echo 'proxy = http://proxy.uni-leipzig.de:3128' >> /etc/yum.conf

rpm -Uvh --httpproxy proxy.uni-leipzig.de --httpport 3128 https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# update all the base packages from the updates repository
if [ -f /usr/bin/dnf ]; then
   dnf -y update
else
   yum -t -y update
fi

# SSH keys setup snippet for Remote Execution plugin


··· #
# Parameters:
#
# remote_execution_ssh_keys: public keys to be put in ~/.ssh/authorized_keys #
# remote_execution_ssh_user: user for which remote_execution_ssh_keys will be # authorized
#
# remote_execution_create_user: create user if it not already existing #
# remote_execution_effective_user_method: method to switch from ssh user to # effective user
#
# This template sets up SSH keys in any host so that as long as your public # SSH key is in remote_execution_ssh_keys, you can SSH into a host. This only # works in combination with Remote Execution plugin.

# The Remote Execution plugin queries smart proxies to build the # remote_execution_ssh_keys array which is then made available to this template # via the host's parameters. There is currently no way of supplying this # parameter manually.
# See http://projects.theforeman.org/issues/16107 for details.

rpm -Uvh --httpproxy proxy.uni-leipzig.de --httpport 3128 https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm

if [ -f /usr/bin/dnf ]; then
   dnf -y install puppet-agent
else
   yum -t -y install puppet-agent
fi

cat > /etc/puppetlabs/puppet/puppet.conf << EOF

[main]
vardir = /opt/puppetlabs/puppet/cache
logdir = /var/log/puppetlabs/puppet
rundir = /var/run/puppetlabs
ssldir = /etc/puppetlabs/puppet/ssl

[agent]
pluginsync = true
report = true
ignoreschedules = true
ca_server = urzlxdeploy.rz.uni-leipzig.de
certname = galaxy110.sc.uni-leipzig.de
environment = production
server = urzlxdeploy.rz.uni-leipzig.de

EOF

puppet_unit=puppet
/usr/bin/systemctl list-unit-files | grep -q puppetagent && puppet_unit=puppetagent /usr/bin/systemctl enable ${puppet_unit}
/sbin/chkconfig --level 345 puppet on

# export a custom fact called 'is_installer' to allow detection of the installer environment in Puppet modules export FACTER_is_installer=true
# passing a non-existent tag like "no_such_tag" to the puppet agent only initializes the node /opt/puppetlabs/bin/puppet agent --config /etc/puppetlabs/puppet/puppet.conf --onetime --tags no_such_tag --server urzlxdeploy.rz.uni-leipzig.de --no-daemonize

sync

# Inform the build system that we are done.
echo "Informing Foreman that we are built"
wget -q -O /dev/null --no-check-certificate http://urzlxdeploy.rz.uni-leipzig.de/unattended/built ) 2>&1 | tee /root/install.post.log
exit 0

%end

Thanks in advance for your suggestions.

Cheers,

Vadim

--
Vadim Bulst

Universität Leipzig / URZ
04109 Leipzig, Augustusplatz 10

phone: +49-341-97-33380
mail: vadim.bulst@uni-leipzig.de
Hello,

I never heard of these and I can confirm we haven't changed much around this in the last release, as you can see our kickstart simply use default network configuration, thus NetworkManager. I think you need to reach out to CentOS groups or better test on RHEL and create Bugzilla for NetworkManager.

Doublecheck your DHCP, in case of slow DHCP responses things can go bad.

··· On Thu, Nov 23, 2017 at 9:44 PM, Vadim Bulst <vadim.bulst@uni-leipzig.de> wrote:
Hi there,

after using Foreman successful on our clusters for more than a year. I'd
like to reinstall a 90 node cluster with Centos 7.4. It's now running on
Centos 7.3 . I'm not able to just update to 7.4 because of zfsonlinux
dependencies and well - some nodes died and had to bare metal install them.

So I was able to install these nodes successfully by pxe-booting and using a
regular CentOS mirror. After the final reboot the nodes g ot no network
connection at all and puppet wasn't able to pull of course. After logging in
locally and restart NetworkManager the connection came up - sometimes on the
first try sometimes on the second try. I never discovered such behavior with
Centos 7.3 or 7.2.

Network properties:

DHCP, MTU 9000

DHCP-Server not Foreman managed, on different network

TFTP-Server Foreman managed, on different network

I've read one thread on stackexchange which describes a simular problem
using a kickstart installation and dhcp network configuration on Centos 7.4



Does any body of you discovered similar problems?

This is what my provisioning template / kickstart template looks like:

install
url --url http://mirror.centos.org/centos/7.4.1708/os/x86_64
--proxy=http://proxy.uni-leipzig.de:3128
lang en_US.UTF-8
selinux --enforcing
keyboard de
skipx

network --bootproto dhcp --hostname galaxy110.sc.uni-leipzig.de
--device=somemacaddress
rootpw --iscrypted foo
firewall --service=ssh
authconfig --useshadow --passalgo=SHA256 --kickstart
timezone --utc Europe/Berlin
services --disabled
gpm,sendmail,cups,pcmcia,isdn,rawdevices,hpoj,bluetooth,openibd,avahi-daemon,avahi-dnsconfd,hidd,hplip,pcscd

bootloader --location=mbr --append="nofb quiet splash=quiet"

zerombr
clearpart --initlabel --all
ignoredisk --only-use=sda
part biosboot --size 1 --fstype=biosboot --asprimary
part / --fstype=xfs --size=20480 --asprimary --ondisk=sda
part swap --size=131072 --ondisk=sda
part /var/log --fstype=xfs --size=10240 --ondisk=sda
part /home --fstype=xfs --size=10240 --grow --ondisk=sda

text
reboot

%packages
yum
dhclient
ntp
wget
@Core
redhat-lsb-core
%end

%post --nochroot
exec < /dev/tty3 > /dev/tty3
#changing to VT 3 so that we can see whats going on....
/usr/bin/chvt 3
(
cp -va /etc/resolv.conf /mnt/sysimage/etc/resolv.conf
/usr/bin/chvt 1
) 2>&1 | tee /mnt/sysimage/root/install.postnochroot.log
%end
%post
logger "Starting anaconda galaxy110.sc.uni-leipzig.de postinstall"
exec < /dev/tty3 > /dev/tty3
#changing to VT 3 so that we can see whats going on....
/usr/bin/chvt 3
(

#update local time
echo "updating system time"
/usr/sbin/ntpdate -sub 139.18.1.2
/usr/sbin/hwclock --systohc

# Yum proxy
echo 'proxy = http://proxy.uni-leipzig.de:3128' >> /etc/yum.conf

rpm -Uvh --httpproxy proxy.uni-leipzig.de --httpport 3128
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# update all the base packages from the updates repository
if [ -f /usr/bin/dnf ]; then
dnf -y update
else
yum -t -y update
fi

# SSH keys setup snippet for Remote Execution plugin
#
# Parameters:
#
# remote_execution_ssh_keys: public keys to be put in ~/.ssh/authorized_keys
#
# remote_execution_ssh_user: user for which remote_execution_ssh_keys will
be
# authorized
#
# remote_execution_create_user: create user if it not already existing
#
# remote_execution_effective_user_method: method to switch from ssh user to
# effective user
#
# This template sets up SSH keys in any host so that as long as your public
# SSH key is in remote_execution_ssh_keys, you can SSH into a host. This
only
# works in combination with Remote Execution plugin.

# The Remote Execution plugin queries smart proxies to build the
# remote_execution_ssh_keys array which is then made available to this
template
# via the host's parameters. There is currently no way of supplying this
# parameter manually.
# See http://projects.theforeman.org/issues/16107 for details.

rpm -Uvh --httpproxy proxy.uni-leipzig.de --httpport 3128
https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm

if [ -f /usr/bin/dnf ]; then
dnf -y install puppet-agent
else
yum -t -y install puppet-agent
fi

cat > /etc/puppetlabs/puppet/puppet.conf << EOF

[main]
vardir = /opt/puppetlabs/puppet/cache
logdir = /var/log/puppetlabs/puppet
rundir = /var/run/puppetlabs
ssldir = /etc/puppetlabs/puppet/ssl

[agent]
pluginsync = true
report = true
ignoreschedules = true
ca_server = urzlxdeploy.rz.uni-leipzig.de
certname = galaxy110.sc.uni-leipzig.de
environment = production
server = urzlxdeploy.rz.uni-leipzig.de

EOF

puppet_unit=puppet
/usr/bin/systemctl list-unit-files | grep -q puppetagent &&
puppet_unit=puppetagent
/usr/bin/systemctl enable ${puppet_unit}
/sbin/chkconfig --level 345 puppet on

# export a custom fact called 'is_installer' to allow detection of the
installer environment in Puppet modules
export FACTER_is_installer=true
# passing a non-existent tag like "no_such_tag" to the puppet agent only
initializes the node
/opt/puppetlabs/bin/puppet agent --config /etc/puppetlabs/puppet/puppet.conf
--onetime --tags no_such_tag --server urzlxdeploy.rz.uni-leipzig.de
--no-daemonize

sync

# Inform the build system that we are done.
echo "Informing Foreman that we are built"
wget -q -O /dev/null --no-check-certificate
http://urzlxdeploy.rz.uni-leipzig.de/unattended/built
) 2>&1 | tee /root/install.post.log
exit 0

%end

Thanks in advance for your suggestions.

Cheers,

Vadim

--
Vadim Bulst

Universität Leipzig / URZ
04109 Leipzig, Augustusplatz 10

phone: +49-341-97-33380
mail: vadim.bulst@uni-leipzig.de

--
You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


--
Later,
  Lukas @lzap Zapletal