Redmine running slowly

Gwmngilfen · October 4, 2017, 4:18pm

Since the start of this week Redmine has been consistently slow.
Console shows very high load (not dropping below 10, often above 20).
We haven't chnaged anyhting, so my suspicions are on the v2 sunset that
happened at the end of last week.

I've opened a support case to see what's going on, but for now treat
Redmine with care

Greg

lzap · October 5, 2017, 12:08pm

Ohad mentioned this on our coffee meeting, but why we insist on
OpenShift? It's just a regular Rails app, we could easily drop it to
one of our web hosts and maybe the performance will be even better
than it was.

In other words, no OpenShift V3, our infra for now and forever. Not
sure how much work is to puppetize this, but this could be, I know
someone will eat me for these words, simply a pet server with no
automatization.

/me runs away!

LZ

···

On Wed, Oct 4, 2017 at 6:18 PM, Greg Sutcliffe wrote: > Since the start of this week Redmine has been consistently slow. > Console shows very high load (not dropping below 10, often above 20). > We haven't chnaged anyhting, so my suspicions are on the v2 sunset that > happened at the end of last week. > > I've opened a support case to see what's going on, but for now treat > Redmine with care ;) > > Greg > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.

–
Later,
Lukas @lzap Zapletal

Gwmngilfen · October 5, 2017, 1:35pm

Our infra is already overloaded (see other mials on this list), so
something else would have to give way for it. Openshift is free and a
good fit for anything "simple" and self-contained, which is why
Redmine, PrProcessor, and the Etherpad run on it.

If we cannot get this resolved, or if the extra v3 resources aren't
granted, then we'll have to do exactly that, but I'll take what free
handouts we can get

Current update BTW, the load is dropping (now sitting around 7) and
I've opened a support ticket. Preliminary feedback is that is likely
due to a large number of people upgrading their v2 accounts to Silver
to avoid the sunset, leading to increased load on the cluster.
Hopefully it'll settle down soon.

Greg

···

On Thu, 2017-10-05 at 14:08 +0200, Lukas Zapletal wrote: > Ohad mentioned this on our coffee meeting, but why we insist on > OpenShift? It's just a regular Rails app, we could easily drop it to > one of our web hosts and maybe the performance will be even better > than it was. > > In other words, no OpenShift V3, our infra for now and forever. Not > sure how much work is to puppetize this, but this could be, I know > someone will eat me for these words, simply a pet server with no > automatization. > > /me runs away! :-)

Dirk · October 10, 2017, 2:40pm

Now Redmine seems to be down completely. Only getting 404 or 502 errors
since an half hour.

Regards,
Dirk

akofink · October 10, 2017, 2:50pm

Me as well. It's quite difficult to work this way.

···

On Tue, Oct 10, 2017 at 10:40 AM, Dirk Götz wrote:

Now Redmine seems to be down completely. Only getting 404 or 502 errors
since an half hour.

Regards,
Dirk

–
You received this message because you are subscribed to the Google Groups
“foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

–
Andrew Kofink
akofink@redhat.com
IRC: akofink
Associate Software Engineer
Red Hat Satellite

ekohl · October 10, 2017, 3:54pm

We deployed a new version but that took longer than expected. Now we use
bare git clones rather than doing full checkouts. This should save a lot
of IO which is generally a limiting factor. Hopefully this helps enough
until we can migrate to the new platform.

https://github.com/theforeman/redmine/commit/cb4ccf049e0c892fcbba98861c904492e9833a67

···

On Tue, Oct 10, 2017 at 10:50:24AM -0400, Andrew Kofink wrote: >Me as well. It's quite difficult to work this way. > >On Tue, Oct 10, 2017 at 10:40 AM, Dirk Götz wrote: > >> Now Redmine seems to be down completely. Only getting 404 or 502 errors >> since an half hour.

Gwmngilfen · October 10, 2017, 7:21pm

Yeah, I know

Openshift aren't saying much other than that this is mainly due to the
number of people that decided to upgrade to Silver Tier to avoid the
sunset of v2. That's putting a lot of load on the v2 cluster, which
obviously is hitting us.

As Ewoud said, we've made a change today in how we process the
underlying cron jobs that should reduce the amount of IO we were doing

if there's any kind of quota-ing going on, that should help. We're
seeing that bring the time taken to run the cron down to about 10mins
(starting at the top of the hour). That should improve things during
that period. Sadly I did make a mistake during a manual part of the
changes that impacted the DB, but that should be resolved now.

Base load now seems to be down to around 7-9 which is better but still
too high. Sadly the v3 resources are unlikely to be available before
November, which is a limiter. If things are not better in the next day
or two, then on Thu or Fri I may migrate it to our Scaleway account
anyway, as we have capacity there, although I'd rather not migrate
twice…

Greg

···

On Tue, 2017-10-10 at 10:50 -0400, Andrew Kofink wrote: > Me as well. It's quite difficult to work this way.

lzap · October 11, 2017, 8:54am

Thanks guys for doing this.

If things go terribly wrong, we still have an account on EC2 where we
run our koji.

···

On Tue, Oct 10, 2017 at 9:21 PM, Greg Sutcliffe wrote: > On Tue, 2017-10-10 at 10:50 -0400, Andrew Kofink wrote: >> Me as well. It's quite difficult to work this way. > > Yeah, I know :( > > Openshift aren't saying much other than that this is mainly due to the > number of people that decided to upgrade to Silver Tier to avoid the > sunset of v2. That's putting a lot of load on the v2 cluster, which > obviously is hitting us. > > As Ewoud said, we've made a change today in how we process the > underlying cron jobs that should reduce the amount of IO we were doing > - if there's any kind of quota-ing going on, that should help. We're > seeing that bring the time taken to run the cron down to about 10mins > (starting at the top of the hour). That should improve things during > that period. Sadly I did make a mistake during a manual part of the > changes that impacted the DB, but that should be resolved now. > > Base load now seems to be down to around 7-9 which is better but still > too high. Sadly the v3 resources are unlikely to be available before > November, which is a limiter. If things are not better in the next day > or two, then on Thu or Fri I may migrate it to our Scaleway account > anyway, as we have capacity there, although I'd rather not migrate > twice... > > Greg > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.

–
Later,
Lukas @lzap Zapletal

Gwmngilfen · October 11, 2017, 10:41am

Things are no better today, and I'm out of patience with Openshift.
I've spun up an instance on Scaleway that should be able to cope and am
in the process of creating a copy of Redmine there. Once it's ready
I'll stop Openshift, re-import the db and cut over the DNS.

Continue to use Redmine as normal for now, I'll update when we're ready
to do the final cutover.

Greg

Gwmngilfen · October 11, 2017, 2:40pm

Update on migration:

http://51.15.192.166 is now live for your testing pleasure, with a copy
of the DB from a few days ago. Email is currently disabled, so you
can't spam anyone. There's still a few small tasks to sort out that
will keep me busy, but if you want to see if you can break it, go
ahead.

Sadly, SMTP outbound is blocked by default on Scaleway, which I didn't
realise until about 20 mins ago. I've raised a ticket to lift this so
Redmine can send email, but until that's resolved we can't complete the
migration.

Once email is confirmed working, we'll schedule a maintenance window,
where I will stop the Openshift instance and make a final DB dump. Ohad
will then do a DNS switch, and as soon as it comes back you should all
be good to go.

Stay tuned for further updates. I'll announce the maintenance window
before I start.

Greg

ohadlevy · October 11, 2017, 4:36pm

Update on migration:

http://51.15.192.166 is now live for your testing pleasure, with a copy

Any chance to update/enable https in the process?

Thanks!

of the DB from a few days ago. Email is currently disabled, so you
can't spam anyone. There's still a few small tasks to sort out that
will keep me busy, but if you want to see if you can break it, go
ahead.

Sadly, SMTP outbound is blocked by default on Scaleway, which I didn't
realise until about 20 mins ago. I've raised a ticket to lift this so
Redmine can send email, but until that's resolved we can't complete the
migration.

Once email is confirmed working, we'll schedule a maintenance window,
where I will stop the Openshift instance and make a final DB dump. Ohad
will then do a DNS switch, and as soon as it comes back you should all
be good to go.

Stay tuned for further updates. I'll announce the maintenance window
before I start.

Greg

···

On Oct 11, 2017 5:40 PM, "Greg Sutcliffe" wrote:

–
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ekohl · October 11, 2017, 4:39pm

>
>Update on migration:
>
>http://51.15.192.166 is now live for your testing pleasure, with a copy
>
>
>Any chance to update/enable https in the process?

That is the plan, but first we want to know it's working. If you can
create DNS records we can already test the process with letsencrypt:

redmine01.theforeman.org A 51.15.192.166
redmine01.theforeman.org AAAA 2001:bc8:4400:2300::5:e03

···

On Wed, Oct 11, 2017 at 07:36:39PM +0300, Ohad Levy wrote: >On Oct 11, 2017 5:40 PM, "Greg Sutcliffe" wrote:

Thanks!

of the DB from a few days ago. Email is currently disabled, so you
can’t spam anyone. There’s still a few small tasks to sort out that
will keep me busy, but if you want to see if you can break it, go
ahead.

Sadly, SMTP outbound is blocked by default on Scaleway, which I didn’t
realise until about 20 mins ago. I’ve raised a ticket to lift this so
Redmine can send email, but until that’s resolved we can’t complete the
migration.

Once email is confirmed working, we’ll schedule a maintenance window,
where I will stop the Openshift instance and make a final DB dump. Ohad
will then do a DNS switch, and as soon as it comes back you should all
be good to go.

Stay tuned for further updates. I’ll announce the maintenance window
before I start.

ohadlevy · October 11, 2017, 6:52pm

>
>>
>> Update on migration:
>>
>> http://51.15.192.166 is now live for your testing pleasure, with a copy
>>
>>
>> Any chance to update/enable https in the process?
>>
>
> That is the plan, but first we want to know it's working. If you can
> create DNS records we can already test the process with letsencrypt:
>
> redmine01.theforeman.org A 51.15.192.166
> redmine01.theforeman.org AAAA 2001:bc8:4400:2300::5:e03

both should be resolved.

Ohad

···

On Wed, Oct 11, 2017 at 7:39 PM, Ewoud Kohl van Wijngaarden < ewoud@kohlvanwijngaarden.nl> wrote: > On Wed, Oct 11, 2017 at 07:36:39PM +0300, Ohad Levy wrote: >> On Oct 11, 2017 5:40 PM, "Greg Sutcliffe" >> wrote:

Thanks!

of the DB from a few days ago. Email is currently disabled, so you
can’t spam anyone. There’s still a few small tasks to sort out that
will keep me busy, but if you want to see if you can break it, go
ahead.

Sadly, SMTP outbound is blocked by default on Scaleway, which I didn’t
realise until about 20 mins ago. I’ve raised a ticket to lift this so
Redmine can send email, but until that’s resolved we can’t complete the
migration.

Once email is confirmed working, we’ll schedule a maintenance window,
where I will stop the Openshift instance and make a final DB dump. Ohad
will then do a DNS switch, and as soon as it comes back you should all
be good to go.

Stay tuned for further updates. I’ll announce the maintenance window
before I start.

–
You received this message because you are subscribed to the Google Groups
“foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gwmngilfen · October 12, 2017, 9:08am

Thanks Ohad.

SMTP is now unblocked and confirmed working, so we're ready to switch
over. As soon as I can get a timeframe from Ohad for when the DNS
change can be done, I'll arrange a maintenance window.

Hang in there guys, we're almost done

Greg

···

On Wed, 2017-10-11 at 21:52 +0300, Ohad Levy wrote: > > > redmine01.theforeman.org A 51.15.192.166 > > redmine01.theforeman.org AAAA 2001:bc8:4400:2300::5:e03 > > both should be resolved.

Gwmngilfen · October 12, 2017, 9:48am

We're ready to switch. I'll take Redmine offline in about 30mins (11.15
UK time) and we should be back up on the new host at 12 noon. Please
save your work

Greg

Gwmngilfen · October 12, 2017, 10:50am

And we're back. Please test extensively and report issues

In particular, we're now on Ruby 2.0 (up from 1.9 on Openshift) so I
suspect the plugins might have issues. Any issues, please report them
here and we'll take a look. I checked quickly and didn't see any
errors, but I was in a hurry

Things still to do:

Add logrotate for the production.log files
Add HTTPS (ewoud is on that)
Upgrade to latest Redmine (I'm looking into it)
Puppetize it so we can migrate more easily in future

We'll monitor to see if the resources are enough, let me know if you
see any issues (no, Backlogs does not count, that thing is slow :P)

Thanks for your patience everyone.
Greg

ekohl · October 12, 2017, 8:08pm

We did have to do some changes to the importing of git repos. If you
merge an issue with a keyword (refs or fixes) and it doesn't update the
issue in an hour then please let us know.

···

On Thu, Oct 12, 2017 at 11:50:55AM +0100, Greg Sutcliffe wrote: >And we're back. Please test extensively and report issues ;) > >In particular, we're now on Ruby 2.0 (up from 1.9 on Openshift) so I >suspect the plugins might have issues. Any issues, please report them >here and we'll take a look. I checked quickly and didn't see any >errors, but I was in a hurry :P > >Things still to do: > >* Add logrotate for the production.log files >* Add HTTPS (ewoud is on that) >* Upgrade to latest Redmine (I'm looking into it) >* Puppetize it so we can migrate more easily in future > >We'll monitor to see if the resources are enough, let me know if you >see any issues (no, Backlogs does not count, that thing is *slow* :P)