foreman-nightly-rpm-pipeline 611 failed

jenkins · June 10, 2020, 11:02pm

Foreman RPM nightly pipeline failed:

https://ci.theforeman.org/job/foreman-nightly-rpm-pipeline/611/

foreman-nightly-centos8-test (passed)
foreman-nightly-centos7-test (passed)
foreman-nightly-centos7-upgrade-test (failed)

ehelms · June 16, 2020, 8:59pm

I did some investigating into the failing upgrades and here is what I found from my testing.

The failure is that when upgrading to our current staged nightlies, puma on restart loses the file descriptor associated with the socket and enters a restart loop. Restarting the foreman.socket does seem to fix it but we shouldn’t have to restart the socket unless under extraordinary circumstances.

That brings me to trying to track down what got us into this state. From my testing, and tracking, the failure seems to root back to some combination of the two most recent changes to foreman-selinux:

These might be the kind of changes that warrant having to restart the socket on upgrade but I am not familiar enough with the effects of the changes to have any intuition in this area.

I will also add for context that @ekohl had a heck of a time figuring out file descriptor persistence with the SCLs and it is unclear if that is also playing a role here. I did play around with a switch to calling puma directly rather than through rails server here with some success.

@lzap do you have any idea on this from the SELinux side?

lzap · June 17, 2020, 8:15am

Do you see any denials?

Does it work when you switch to permissive?

Can you retest with semodule -DB which turns off “dontaudit” rules just in case there is a hidden denial somewhere? Make sure you reload back with this setting via semodule -B so your audit.log does not get cluttered with endless denials.

I can reproduce this myself but I am unsure what you mean with “staged nigthly upgrade”, is this some kind of upgrade test? 2.0 to 2.1?

evgeni · June 17, 2020, 10:51am

@ehelms is that why the latest foreman-selinux was untagged in koji? and puma

lzap · June 17, 2020, 11:44am

Can you tell me how do I reproduce this? What exactly do I want to upgrade? An RPM?

ehelms · June 17, 2020, 11:46am

I used Forklift mimicking our pipelines:

cd forklift
ansible-playbook pipelines/upgrade_pipeline.yml -e pipeline_version=nightly -e pipeline_type=foreman -e pipeline_os=centos7

I did not see any denials when investigating originally.

lzap · June 17, 2020, 12:54pm

I am getting https://gist.github.com/lzap/df410f3c6c8a454fa59d30008f5b9017 is this it?

lzap · June 17, 2020, 1:49pm

So, running with dontaudit rules disabled, this is what I see:

type=SYSCALL msg=audit(1592399790.499:3899): arch=c000003e syscall=59 success=yes exit=0 a0=55f799f2d9c0 a1=55f799fa66d0 a2=55f799f08450 a3=5 items=0 ppid=1 pid=10433 auid=4294967295 uid=996 gid=993 euid=996 suid=996 fsuid=996 egid=993 sgid=993 fsgid=993 tty=(none) ses=4294967295 comm="rails" exe="/usr/bin/bash" subj=system_u:system_r:foreman_rails_t:s0 key=(null)
type=AVC msg=audit(1592399790.499:3899): avc:  denied  { read write } for  pid=10433 comm="rails" path="socket:[96947]" dev="sockfs" ino=96947 scontext=system_u:system_r:foreman_rails_t:s0 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=tcp_socket

This looks like “rails” script is trying to read and write from unnamed socket which was created by systemd:

# lsof -P | grep 96947
systemd       1                root   40u     IPv4              96947       0t0        TCP localhost:3000 (LISTEN)

This rule solves it:

allow foreman_rails_t unconfined_service_t:tcp_socket { connected_socket_perms };

lzap · June 17, 2020, 1:55pm

Looks like unconfined services has to do something with this:

# ps -efZ | grep unconfined_service
system_u:system_r:unconfined_service_t:s0 puppet 1824 1  1 12:29 ?     00:01:01 /usr/bin/java -Xms2G -Xmx2G -Djruby.logger.class=com.puppetlabs.jruby_utils.jruby.Slf4jLogger -XX:OnOutOfMemoryError=kill -9 %p -cp /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar:/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/facter.jar:/opt/puppetlabs/server/data/puppetserver/jars/* clojure.main -m puppetlabs.trapperkeeper.main --config /etc/puppetlabs/puppetserver/conf.d --bootstrap-config /etc/puppetlabs/puppetserver/services.d/,/opt/puppetlabs/server/apps/puppetserver/config/services.d/ --restart-file /opt/puppetlabs/server/data/puppetserver/restartcounter
system_u:system_r:unconfined_service_t:s0 foreman+ 5074 1  0 12:34 ?   00:00:00 ruby /usr/share/foreman-proxy/bin/smart-proxy --no-daemonize
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 root 22358 11002  0 13:54 pts/0 00:00:00 grep --color=auto unconfined_service
system_u:system_r:unconfined_service_t:s0 root 31181 1  0 12:22 ?      00:00:03 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize

lzap · June 18, 2020, 12:07pm

For the record, we were still missing one rule:

And the BZ for SELinux team is at:

https://bugzilla.redhat.com/show_bug.cgi?id=1848291

system · June 25, 2020, 12:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.