Foreman 1.13 and network interfaces

James_Eckersall · November 17, 2016, 2:07pm

Hi,

We are running a foreman instance on Centos 7.2 which we are sending puppet
reports to from some Puppet 3.6 masters. We have ~1100 hosts which run
puppet every hour.

I upgraded our foreman instance from 1.11 yesterday to 1.12 and then 1.13.

After the 1.13 upgrade, we had big problems with foreman. After running
for some minutes (varied up to an hour), all Passenger workers were stuck
at 100% CPU and ever increasing memory usage until either killed or OOM.

I couldn't find anything in the logs to suggest what was causing this, but
through strace I was able to tie it back to four hosts. These hosts have
~500 IP addresses assigned to them.
We had problems with facter in the past where it would take too long to
iterate the interfaces on these hosts, so we removed the interface fact rb
scripts. A recent update to facter had put these back.
When I looked in the foreman database, there were over 10,000 fact values
relating to these hosts. I've removed the facter rb files from these hosts
and deleted the associated db records and the foreman box has recovered
back to its former happy self.

I'm curious to know how I would go about debugging this further, or whether
this is something that would be of interest to the foreman devs.
I can reinstate those facts and gather sanitised data when the problem is
occurring if required.
I did enable debug logging in foreman but this didn't really help at all.
strace was the only thing that helped.

J

lzap · November 21, 2016, 3:19pm

Hey,

there was several bugfixes in regard to fact upload performance in 1.12 and
1.13. These patches are all live AFAIK, so make sure you have the latest
minor release. Our code was slow in particular where there are many facts.
We did some microoptimalizations to make this a bit faster.

I don't know about any report in regard to a host with 500 IP addresses,
but we have seen this reported on this list already. Is this bug in facter?

LZ

···

On Thu, Nov 17, 2016 at 3:07 PM, jamese wrote:

Hi,

We are running a foreman instance on Centos 7.2 which we are sending
puppet reports to from some Puppet 3.6 masters. We have ~1100 hosts which
run puppet every hour.

I upgraded our foreman instance from 1.11 yesterday to 1.12 and then 1.13.

After the 1.13 upgrade, we had big problems with foreman. After running
for some minutes (varied up to an hour), all Passenger workers were stuck
at 100% CPU and ever increasing memory usage until either killed or OOM.

I couldn’t find anything in the logs to suggest what was causing this, but
through strace I was able to tie it back to four hosts. These hosts have
~500 IP addresses assigned to them.
We had problems with facter in the past where it would take too long to
iterate the interfaces on these hosts, so we removed the interface fact rb
scripts. A recent update to facter had put these back.
When I looked in the foreman database, there were over 10,000 fact values
relating to these hosts. I’ve removed the facter rb files from these hosts
and deleted the associated db records and the foreman box has recovered
back to its former happy self.

I’m curious to know how I would go about debugging this further, or
whether this is something that would be of interest to the foreman devs.
I can reinstate those facts and gather sanitised data when the problem is
occurring if required.
I did enable debug logging in foreman but this didn’t really help at all.
strace was the only thing that helped.

J

–
You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

–
Later,
Lukas @lzap Zapletal

James_Eckersall · November 21, 2016, 10:28pm

Hi,

Our foreman installation is using 1.13.1, so as far as I know, it's as up
to date as it can be.

I don't know if the network interface issue has been reported previously.

I can report that since removing the network interface facts
(macaddress.rb, interfaces.rb, ipaddress.rb, netmask.rb, network.rb) we
have not seen a reoccurrence of the issue.
As stated before, I'm happy to re-enable those facts and try and reproduce
the problem if you can give me details on how to capture any debugging
information relevant to you.

···

On Mon, 21 Nov 2016 at 15:19 Lukas Zapletal wrote:

Hey,

there was several bugfixes in regard to fact upload performance in 1.12
and 1.13. These patches are all live AFAIK, so make sure you have the
latest minor release. Our code was slow in particular where there are many
facts. We did some microoptimalizations to make this a bit faster.

I don’t know about any report in regard to a host with 500 IP addresses,
but we have seen this reported on this list already. Is this bug in facter?

LZ

On Thu, Nov 17, 2016 at 3:07 PM, jamese james.eckersall@gmail.com wrote:

Hi,

We are running a foreman instance on Centos 7.2 which we are sending
puppet reports to from some Puppet 3.6 masters. We have ~1100 hosts which
run puppet every hour.

I upgraded our foreman instance from 1.11 yesterday to 1.12 and then 1.13.

After the 1.13 upgrade, we had big problems with foreman. After running
for some minutes (varied up to an hour), all Passenger workers were stuck
at 100% CPU and ever increasing memory usage until either killed or OOM.

I couldn’t find anything in the logs to suggest what was causing this, but
through strace I was able to tie it back to four hosts. These hosts have
~500 IP addresses assigned to them.
We had problems with facter in the past where it would take too long to
iterate the interfaces on these hosts, so we removed the interface fact rb
scripts. A recent update to facter had put these back.
When I looked in the foreman database, there were over 10,000 fact values
relating to these hosts. I’ve removed the facter rb files from these hosts
and deleted the associated db records and the foreman box has recovered
back to its former happy self.

I’m curious to know how I would go about debugging this further, or
whether this is something that would be of interest to the foreman devs.
I can reinstate those facts and gather sanitised data when the problem is
occurring if required.
I did enable debug logging in foreman but this didn’t really help at all.
strace was the only thing that helped.

J

–
You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.

To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

–
Later,
Lukas @lzap Zapletal

–
You received this message because you are subscribed to a topic in the
Google Groups “Foreman users” group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/foreman-users/maksRtxBPho/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.