Ruby192-scl support for kafo?

Myself, and many other users are getting bit pretty bad by this
installer bug that effects 1.4 and blocks the installation:

http://projects.theforeman.org/issues/4244

based on some digging by jmontleon and others it turns out the core dump
occurs during the execution of some virtual resource useage:

https://bugzilla.redhat.com/show_bug.cgi?id=1064340#c10

"""
Going further, I managed to find that if I commented out the virtual
resource in compute.pp

and rewrite each file in the compute directory from a realize statement
(i.e realize Package['foreman-compute'] ) as a straight package install
(i.e. package { foreman-compute: ensure => installed } ) the seg faults
went away.

Interestingly just uncommenting the virtual resource is enough to start
getting seg faults again - I don't even need to revert the package
installs to realize statements…
"""

To work around it I hacked kafo to use the ruby193 wrapper:

#!/usr/bin/ruby193-ruby

require 'rubygems'
require 'highline/import'
require 'yaml'
require 'kafo'

where to find answer file

CONFIG_FILE = "/etc/foreman/foreman-installer.yaml"

and ran through the installer on a system that was seg-faulting during
installation. The install finished fine. This lead me to the thought
that perhaps we should ditch ruby-187 support for kafo on Centos/EL6
variants and move to ruby-193. We can mod the installer to work around
the issue but we are still running against a really old version of Ruby
and might be safer moving to the SCL.

thoughts?
Mike

··· -- Mike McCune mmccune AT redhat.com Red Hat Engineering | Portland, OR Systems Management | 650-254-4248

> Myself, and many other users are getting bit pretty bad by this
> installer bug that effects 1.4 and blocks the installation:
>
> Bug #4244: Core dump during foreman-installer - Installer - Foreman
>
> based on some digging by jmontleon and others it turns out the core dump
> occurs during the execution of some virtual resource useage:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1064340#c10
>
> """
> Going further, I managed to find that if I commented out the virtual
> resource in compute.pp
>
> and rewrite each file in the compute directory from a realize statement
> (i.e realize Package['foreman-compute'] ) as a straight package install
> (i.e. package { foreman-compute: ensure => installed } ) the seg faults
> went away.
>
> Interestingly just uncommenting the virtual resource is enough to start
> getting seg faults again - I don't even need to revert the package
> installs to realize statements…
> """

This might be the case for Katello, but the bug was filed against
foreman-installer 1.4.0 which didn't contain compute resource support,
so no virtuals in this area.

> To work around it I hacked kafo to use the ruby193 wrapper:
>
> #!/usr/bin/ruby193-ruby
>
> require 'rubygems'
> require 'highline/import'
> require 'yaml'
> require 'kafo'
>
> # where to find answer file
> CONFIG_FILE = "/etc/foreman/foreman-installer.yaml"
>
> and ran through the installer on a system that was seg-faulting during
> installation. The install finished fine. This lead me to the thought
> that perhaps we should ditch ruby-187 support for kafo on Centos/EL6
> variants and move to ruby-193. We can mod the installer to work around
> the issue but we are still running against a really old version of Ruby
> and might be safer moving to the SCL.

This requires that we ship a ruby193 version of Puppet too, which we
haven't done for a long while and I'd really rather not go back to as it
caused a lot of confusion.

Ruby 1.8.7 on EL6 is still supported, the underlying bug should be found
and fixed.

··· On 25/04/14 23:13, Mike McCune wrote:


Dominic Cleal
Red Hat Engineering

>
> > Myself, and many other users are getting bit pretty bad by this
> > installer bug that effects 1.4 and blocks the installation:
> >
> > Bug #4244: Core dump during foreman-installer - Installer - Foreman
> >
> > based on some digging by jmontleon and others it turns out the core dump
> > occurs during the execution of some virtual resource useage:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1064340#c10
> >
> > """
> > Going further, I managed to find that if I commented out the virtual
> > resource in compute.pp
> >
> > and rewrite each file in the compute directory from a realize statement
> > (i.e realize Package['foreman-compute'] ) as a straight package install
> > (i.e. package { foreman-compute: ensure => installed } ) the seg faults
> > went away.
> >
> > Interestingly just uncommenting the virtual resource is enough to start
> > getting seg faults again - I don't even need to revert the package
> > installs to realize statements…
> > """
>
> This might be the case for Katello, but the bug was filed against
> foreman-installer 1.4.0 which didn't contain compute resource support,
> so no virtuals in this area.
>
>
Yes, it's not that virtual resources are necessarily the problem. In our
case, this just seems particularly good at exposing the problem, which from
Mark has told me after looking at the core dumps, appears to be something
to do with garbage collection, and so it seems possible and likely the
issue could pop up elsewhere. A lot of the bugs that appear as though they
could be related suggested using a newer ruby to circumvent the problem and
were marked CANTFIX. I do agree with your comments below about Ruby 1.8.7
still being supported and being reluctant to ship a ruby193 puppet though.

··· On Monday, April 28, 2014 3:29:42 AM UTC-4, Dominic Cleal wrote: > On 25/04/14 23:13, Mike McCune wrote:

To work around it I hacked kafo to use the ruby193 wrapper:

#!/usr/bin/ruby193-ruby

require 'rubygems’
require 'highline/import’
require 'yaml’
require ‘kafo’

where to find answer file

CONFIG_FILE = “/etc/foreman/foreman-installer.yaml”

and ran through the installer on a system that was seg-faulting during
installation. The install finished fine. This lead me to the thought
that perhaps we should ditch ruby-187 support for kafo on Centos/EL6
variants and move to ruby-193. We can mod the installer to work around
the issue but we are still running against a really old version of Ruby
and might be safer moving to the SCL.

This requires that we ship a ruby193 version of Puppet too, which we
haven’t done for a long while and I’d really rather not go back to as it
caused a lot of confusion.

Ruby 1.8.7 on EL6 is still supported, the underlying bug should be found
and fixed.


Dominic Cleal
Red Hat Engineering

On Monday, April 28, 2014 3:29:42 AM UTC-4, Dominic Cleal wrote:

On 25/04/14 23:13, Mike McCune wrote:

Myself, and many other users are getting bit pretty bad by this
installer bug that effects 1.4 and blocks the installation:

Bug #4244: Core dump during foreman-installer - Installer - Foreman

based on some digging by jmontleon and others it turns out the core dump
occurs during the execution of some virtual resource useage:

https://bugzilla.redhat.com/show_bug.cgi?id=1064340#c10

“”"
Going further, I managed to find that if I commented out the virtual
resource in compute.pp

and rewrite each file in the compute directory from a realize statement
(i.e realize Package[‘foreman-compute’] ) as a straight package install
(i.e. package { foreman-compute: ensure => installed } ) the seg faults
went away.

Interestingly just uncommenting the virtual resource is enough to start
getting seg faults again - I don’t even need to revert the package
installs to realize statements…
"""

This might be the case for Katello, but the bug was filed against
foreman-installer 1.4.0 which didn’t contain compute resource support,
so no virtuals in this area.

To work around it I hacked kafo to use the ruby193 wrapper:

#!/usr/bin/ruby193-ruby

require 'rubygems’
require 'highline/import’
require 'yaml’
require ‘kafo’

where to find answer file

CONFIG_FILE = “/etc/foreman/foreman-installer.yaml”

and ran through the installer on a system that was seg-faulting during
installation. The install finished fine. This lead me to the thought
that perhaps we should ditch ruby-187 support for kafo on Centos/EL6
variants and move to ruby-193. We can mod the installer to work around
the issue but we are still running against a really old version of Ruby
and might be safer moving to the SCL.

This requires that we ship a ruby193 version of Puppet too, which we
haven’t done for a long while and I’d really rather not go back to as it
caused a lot of confusion.

Ruby 1.8.7 on EL6 is still supported, the underlying bug should be found
and fixed.


Dominic Cleal
Red Hat Engineering

> > This might be the case for Katello, but the bug was filed against
> > foreman-installer 1.4.0 which didn't contain compute resource support,
> > so no virtuals in this area.
> >
> Yes, it's not that virtual resources are necessarily the problem. In our
> case, this just seems particularly good at exposing the problem, which from
> Mark has told me after looking at the core dumps, appears to be something
> to do with garbage collection, and so it seems possible and likely the
> issue could pop up elsewhere. A lot of the bugs that appear as though they
> could be related suggested using a newer ruby to circumvent the problem and
> were marked CANTFIX. I do agree with your comments below about Ruby 1.8.7
> still being supported and being reluctant to ship a ruby193 puppet though.

There is one workaround, set:

GC.disable

in /usr/bin/puppet but make sure you have plenty of RAM otherwise Ruby
process will be killed by the OS.

I've been haunting the bug today but I was unable to track it down.
Something is mangling with memory, apparently either Ruby objects or
pointers to those are wrong. I tried to upgrade some native extensions
as well, with no luck. Those objects that has corrupted content are on
the main Ruby heap, not on any of the extension memory blocks.

To me it looks like Puppet 3.4 now toggles GC much more often than EPEL
Ruby 2.7, the bug was there, but hidden. Since ruby is part of RHEL, we
should really try hard to fix it.

··· -- Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman

>>> This might be the case for Katello, but the bug was filed against
>>> foreman-installer 1.4.0 which didn't contain compute resource support,
>>> so no virtuals in this area.
>>>
>> Yes, it's not that virtual resources are necessarily the problem. In our
>> case, this just seems particularly good at exposing the problem, which from
>> Mark has told me after looking at the core dumps, appears to be something
>> to do with garbage collection, and so it seems possible and likely the
>> issue could pop up elsewhere. A lot of the bugs that appear as though they
>> could be related suggested using a newer ruby to circumvent the problem and
>> were marked CANTFIX. I do agree with your comments below about Ruby 1.8.7
>> still being supported and being reluctant to ship a ruby193 puppet though.
>
> There is one workaround, set:
>
> GC.disable
>
> in /usr/bin/puppet but make sure you have plenty of RAM otherwise Ruby
> process will be killed by the OS.

I can confirm this does help get around the problem.

>
> I've been haunting the bug today but I was unable to track it down.
> Something is mangling with memory, apparently either Ruby objects or
> pointers to those are wrong. I tried to upgrade some native extensions
> as well, with no luck. Those objects that has corrupted content are on
> the main Ruby heap, not on any of the extension memory blocks.
>
> To me it looks like Puppet 3.4 now toggles GC much more often than EPEL
> Ruby 2.7, the bug was there, but hidden. Since ruby is part of RHEL, we
> should really try hard to fix it.
>

thanks all for the feedback on the proposal, thumbs down on SCL-ing kafo
directly and seeing if we can track down fixing the GC bug.

Mike

··· On 04/28/2014 09:42 AM, Lukas Zapletal wrote:


Mike McCune
mmccune AT redhat.com
Red Hat Engineering | Portland, OR
Systems Management | 650-254-4248