Foreman 1.7.5 performance capacity limit

Andrew_Enstad · October 7, 2015, 9:21pm

We have the following:

Foreman 1.7.5
Openstack Icehouse

This Foreman instance has been running fine for many months, up until the
past few weeks. Now if we submit only 1 job through Jenkins for a single
server build and then also create a single VM through the Foreman
interface, Foreman basically goes "out to lunch" for a while. The OS is
not being taxed for CPU or Memory, but Foreman is un-responsive for a
while. When this happened today I rand the foreman-debug, which it said it
uploaded a tar file for Developers to view. File
name: foreman-debug-WEedA.tar.xz from Oct 7th at 13:43.

Is there anything I need to be looking at to fix this issue?

Thanks,

ohadlevy · October 8, 2015, 6:59am

> We have the following:
>
> Foreman 1.7.5
> Openstack Icehouse
>
> This Foreman instance has been running fine for many months, up until the
> past few weeks. Now if we submit only 1 job through Jenkins for a single
> server build and then also create a single VM through the Foreman
> interface, Foreman basically goes "out to lunch" for a while. The OS is
> not being taxed for CPU or Memory, but Foreman is un-responsive for a
> while. When this happened today I rand the foreman-debug, which it said it
> uploaded a tar file for Developers to view. File
> name: foreman-debug-WEedA.tar.xz from Oct 7th at 13:43.
>
> Is there anything I need to be looking at to fix this issue?
>

do you use ssh finish scripts or userdata? if using ssh it actually waits
until the vm is responding to ssh.
You could run another API call at the same time without a problem, however,
an alternative could be using userdata instead, which would not require
waiting for ssh.

Ohad

···

On Thu, Oct 8, 2015 at 12:21 AM, Andrew Enstad wrote:

Thanks,

–
You received this message because you are subscribed to the Google Groups
“foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew_Enstad · October 12, 2015, 10:22pm

It seems that this happens when there is a limit that is reached within
Foreman, just not sure. I did a test today and there does seem to be about
a 10 minute time out when Foreman's UI is unresponsive. During this time I
did look on the Openstack Controller node and didn't see anything that
would point to an issue on the Openstack side. I even was able to ssh into
the VM that Foreman had an issue doing an ssh connection to prior to the
10-minute timeout. This is getting really frustrating that this tool just
starting acting like this in the past month. I have discussed this with
other Engineers and they are not aware of any changes. The network is a
very simple flat network for this Foreman/Openstack/Nova Compute setup:

Foreman server
Openstack Controller (including everything except for the Compute services)
Openstack Compute servers (15 compute nodes)

All the firewalls are open for ssh (port 22) traffic between these servers.
Openstack ports haven't changed, they still have port 22 open for all the
tenants.

I have gone through all the /etc/foreman, /etc/foreman-proxy, /etc/nova
files and have not found anything that would point to why Foreman is
getting these ssh-timeouts.

Could there be a Passenger/Ruby limit that I am hitting?

Anyone on the Foreman Development community know of anything to check? Or
maybe I could work with some to help troubleshoot this problem?

Thanks,
Andrew

···

On Wednesday, October 7, 2015 at 10:31:12 PM UTC-5, Andrew Enstad wrote: > > We have the following: > > Foreman 1.7.5 > Openstack Icehouse > > This Foreman instance has been running fine for many months, up until the > past few weeks. Now if we submit only 1 job through Jenkins for a single > server build and then also create a single VM through the Foreman > interface, Foreman basically goes "out to lunch" for a while. The OS is > not being taxed for CPU or Memory, but Foreman is un-responsive for a > while. When this happened today I rand the foreman-debug, which it said it > uploaded a tar file for Developers to view. File > name: foreman-debug-WEedA.tar.xz from Oct 7th at 13:43. > > Is there anything I need to be looking at to fix this issue? > > Thanks, > > >

Andrew_Enstad · October 8, 2015, 9:13pm

I am seeing ssh calls in the /var/log/foreman/production.log, so I would
assume we are using ssh. Has there been any issues with using ssh with a
capacity/timeout causing Foreman UI to go unresponsive? Which method would
be best for highest/quickest capacity? Where do I change it?

···

On Thursday, October 8, 2015 at 1:59:23 AM UTC-5, ohadlevy wrote: > > > > On Thu, Oct 8, 2015 at 12:21 AM, Andrew Enstad > wrote: > >> We have the following: >> >> Foreman 1.7.5 >> Openstack Icehouse >> >> This Foreman instance has been running fine for many months, up until the >> past few weeks. Now if we submit only 1 job through Jenkins for a single >> server build and then also create a single VM through the Foreman >> interface, Foreman basically goes "out to lunch" for a while. The OS is >> not being taxed for CPU or Memory, but Foreman is un-responsive for a >> while. When this happened today I rand the foreman-debug, which it said it >> uploaded a tar file for Developers to view. File >> name: foreman-debug-WEedA.tar.xz from Oct 7th at 13:43. >> >> Is there anything I need to be looking at to fix this issue? >> > > do you use ssh finish scripts or userdata? if using ssh it actually waits > until the vm is responding to ssh. > You could run another API call at the same time without a problem, > however, an alternative could be using userdata instead, which would not > require waiting for ssh. > > Ohad > >> >> Thanks, >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "foreman-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-dev...@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

Gwmngilfen · October 13, 2015, 7:53am

> It seems that this happens when there is a limit that is reached within
> Foreman, just not sure. I did a test today and there does seem to be about
> a 10 minute time out when Foreman's UI is unresponsive. During this time I
> did look on the Openstack Controller node and didn't see anything that would
> point to an issue on the Openstack side. I even was able to ssh into the VM
> that Foreman had an issue doing an ssh connection to prior to the 10-minute
> timeout. This is getting really frustrating that this tool just starting
> acting like this in the past month. I have discussed this with other
> Engineers and they are not aware of any changes. The network is a very
> simple flat network for this Foreman/Openstack/Nova Compute setup:
>
> Foreman server
> Openstack Controller (including everything except for the Compute services)
> Openstack Compute servers (15 compute nodes)
>
> All the firewalls are open for ssh (port 22) traffic between these servers.
> Openstack ports haven't changed, they still have port 22 open for all the
> tenants.
>
> I have gone through all the /etc/foreman, /etc/foreman-proxy, /etc/nova
> files and have not found anything that would point to why Foreman is getting
> these ssh-timeouts.

Ok, so that sounds like we're sure of it being something in the SSH
system. That's progress. Try setting Foreman logging to DEBUG and tail
the logs while creating a VM. You should see it logging what it's
doing (which IP it's connecting to, and output from the ssh script if
its running). See if the data matches your expectation (is the IP
right, etc)

> Could there be a Passenger/Ruby limit that I am hitting?

Unlikely, but possible. Lets try and exclude other issues first.

> Anyone on the Foreman Development community know of anything to check? Or
> maybe I could work with some to help troubleshoot this problem?

Logs of the SSH attempts as per above are definitely the next think to
get hold of. You can also hop into #theforeman on Freenode if you want
to do some realtime troubleshooting. There's usually knowledgable
people around, although that's more likely in Europe / East-Coast USA
office hours.

Cheers,
Greg

···

On 13 October 2015 at 00:22, Andrew Enstad wrote:

Gwmngilfen · October 9, 2015, 9:50am

> I am seeing ssh calls in the /var/log/foreman/production.log, so I would
> assume we are using ssh. Has there been any issues with using ssh with a
> capacity/timeout causing Foreman UI to go unresponsive?

Actually, it's always been this way, as Ohad says. The reason you
don't see it is because most setups for SSH-based image provisioning
use a fairly small script. The time taken for the VM to be created,
booted, SSH started, and have Foreman log in to run it's script is
only a few seconds (e.g. if you're just adding a few SSH keys and
setting the hostname). The login part is UI-blocking, so it becomes
unresponsive, but not for very long.

Where it becomes obvious is when there's an issue. Foreman has (I
think) a ten minute timeout on waiting for that SSH connection. If the
VM is created (failure at that stage would not even start the SSH
loop), but never opens SSH on the IP foreman is expecting (perhaps due
to DHCP, or a firewall) then it will just sit there, blocking the UI.

> Which method would
> be best for highest/quickest capacity? Where do I change it?

If this was working, then I'd investigate the logs to see if your
VMs are somehow behaving differently now. Did your floating IP pool
run out? Has the default security group been changed?

Otherwise, you can't affect Foreman's handling of SSH connections
(without changing the code, anyway :P), but as Ohad says, you could
move to userdata-based provisioning, which is asynchronous and doesn't
block the UI.

···

On 8 October 2015 at 22:13, Andrew Enstad wrote: