Katello capsule content sync super slow?

Hi all,

we have a katello 3.0.1 install with a katello capsule in our DMZ.

we've noticed if we publish/promote content, the main katello server does
this reasonably quickly, but tasks (to sync content) on the capsule are
taking hours to complete to match the change on the main server. Does
anyone know why this is? Can it be improved?

Thanks!
Dylan

Dylan,

Are you able to determine the network traffic between the server and
capsule? In general I think there are two possibilities, slow transfer of
data to the Capsule or slowness on the Capsule itself. I would recommend
checking the network speeds and the Capsule resources to see if its
constrained I/O wise or processes are eating the CPU.

Eric

··· On Tue, Jul 26, 2016 at 10:52 PM, Dylan Baars wrote:

Hi all,

we have a katello 3.0.1 install with a katello capsule in our DMZ.

we’ve noticed if we publish/promote content, the main katello server does
this reasonably quickly, but tasks (to sync content) on the capsule are
taking hours to complete to match the change on the main server. Does
anyone know why this is? Can it be improved?

Thanks!
Dylan


You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Eric D. Helms
Red Hat Engineering
Ph.D. Student - North Carolina State University

Hi Eric,

I noticed our main katello server was swapping like crazy

[root@wellkatello salt]# free -h

          total        used        free      shared  buff/cache   

available

Mem: 15G 14G 457M 155M 594M
510M

Swap: 2.9G 2.9G 168K

and in /var/log/messages getting this (plus a lot more output)

Jul 28 09:51:38 wellkatello.niwa.local kernel: Out of memory: Kill process
5419 (ruby) score 34 or sacrifice child

Jul 28 09:51:38 wellkatello.niwa.local kernel: Killed process 5419 (ruby)
total-vm:1952116kB, anon-rss:642768kB, file-rss:0kB

Something seems to be consuming a lot of memory! Since rebooting at 10am
(~5 hours ago) we've gone from this

[root@wellkatello ~]# free -h

          total        used        free      shared  buff/cache   

available

Mem: 15G 4.3G 9.7G 46M 1.2G
10G

Swap: 2.9G 0B 2.9G

[root@wellkatello ~]#

which is what was available straight after booting + waiting for
katello/foreman services to start, to this

[root@wellkatello salt]# free -h

          total        used        free      shared  buff/cache   

available

Mem: 15G 13G 650M 61M 837M
1.0G

Swap: 2.9G 855M 2.1G

Not sure if this is related to the slow capsule sync?

Hi Eric,

I see celery and mongod on the capsule using a lot of CPU time (celery
seems to jump between 50-100%, and has been going for 435:32.54) and mongod
~30-40%, 286:18.74. Here's current memory usage on the box

free -h
total used free shared buff/cache
available
Mem: 5.7G 3.5G 166M 37M 2.0G
1.9G
Swap: 7.8G 2.4G 5.4G

and current IO
[root@wellcapsuleext foreman-proxy]# sar
Linux 3.10.0-327.el7.x86_64 (wellcapsuleext.niwa.co.nz) 07/28/2016
x86_64 (6 CPU)

12:00:01 AM CPU %user %nice %system %iowait %steal
%idle
12:10:01 AM all 0.76 0.03 0.41 0.02 0.00
98.78
12:20:01 AM all 1.11 0.03 0.51 0.01 0.00
98.35
12:30:01 AM all 0.94 0.03 0.46 0.01 0.00
98.57
12:40:01 AM all 0.77 0.03 0.41 0.01 0.00
98.78
12:50:01 AM all 1.09 0.03 0.50 0.02 0.00
98.37
01:00:01 AM all 0.95 0.03 0.47 0.01 0.00
98.54
01:10:01 AM all 0.74 0.03 0.39 0.01 0.00
98.83
01:20:01 AM all 1.07 0.02 0.48 0.02 0.00
98.40
01:30:01 AM all 0.92 0.03 0.45 0.06 0.00
98.54
01:40:01 AM all 0.77 0.03 0.41 0.01 0.00
98.79
01:50:01 AM all 1.07 0.03 0.49 0.01 0.00
98.41
02:00:01 AM all 1.00 0.03 0.49 0.08 0.00
98.41
02:10:01 AM all 0.76 0.03 0.40 0.01 0.00
98.81
02:20:01 AM all 1.09 0.02 0.50 0.01 0.00
98.38
02:30:01 AM all 0.94 0.03 0.45 0.01 0.00
98.57
02:40:02 AM all 0.79 0.03 0.42 0.01 0.00
98.76
02:50:01 AM all 1.09 0.03 0.50 0.02 0.00
98.36
03:00:01 AM all 0.93 0.03 0.46 0.01 0.00
98.57
03:10:01 AM all 0.79 0.03 0.40 0.01 0.00
98.77
03:20:01 AM all 1.09 0.03 0.50 0.01 0.00
98.37
03:30:01 AM all 0.96 0.02 0.50 0.01 0.00
98.51
03:40:01 AM all 1.15 0.06 1.96 1.39 0.00
95.44
03:50:01 AM all 1.08 0.00 0.43 0.18 0.00
98.31
04:00:01 AM all 0.99 0.00 0.40 0.18 0.00
98.43
04:10:01 AM all 0.75 0.03 0.41 0.02 0.00
98.80
04:20:01 AM all 1.06 0.03 0.50 0.03 0.00
98.39
04:30:01 AM all 0.93 0.03 0.47 0.01 0.00
98.56
04:40:01 AM all 0.74 0.03 0.39 0.02 0.00
98.83
04:50:01 AM all 1.09 0.03 0.51 0.05 0.00
98.33
05:00:01 AM all 0.91 0.03 0.45 0.02 0.00
98.60
05:10:01 AM all 0.76 0.03 0.41 0.02 0.00
98.79
05:20:01 AM all 1.06 0.03 0.49 0.02 0.00
98.41
05:30:01 AM all 0.93 0.03 0.47 0.02 0.00
98.55
05:40:01 AM all 0.74 0.02 0.40 0.02 0.00
98.82
05:50:01 AM all 1.08 0.03 0.52 0.04 0.00
98.34
06:00:01 AM all 0.96 0.02 0.47 0.03 0.00
98.52
06:10:01 AM all 0.74 0.02 0.41 0.01 0.00
98.81
06:20:01 AM all 1.06 0.03 0.49 0.01 0.00
98.41

06:20:01 AM CPU %user %nice %system %iowait %steal
%idle
06:30:01 AM all 0.93 0.02 0.46 0.01 0.00
98.58
06:40:01 AM all 0.75 0.03 0.40 0.01 0.00
98.81
06:50:02 AM all 1.07 0.02 0.50 0.02 0.00
98.38
07:00:01 AM all 0.96 0.02 0.46 0.01 0.00
98.54
07:10:02 AM all 0.77 0.02 0.42 0.01 0.00
98.78
07:20:01 AM all 1.06 0.03 0.48 0.01 0.00
98.42
07:30:01 AM all 0.94 0.03 0.47 0.01 0.00
98.56
07:40:01 AM all 0.78 0.03 0.41 0.01 0.00
98.77
07:50:01 AM all 1.06 0.03 0.50 0.01 0.00
98.41
08:00:02 AM all 0.96 0.03 0.47 0.01 0.00
98.53
08:10:01 AM all 0.75 0.02 0.40 0.01 0.00
98.82
08:20:01 AM all 1.08 0.03 0.49 0.01 0.00
98.40
08:30:02 AM all 10.48 0.02 2.18 1.58 0.00
85.73
08:40:02 AM all 9.51 0.02 2.08 1.99 0.00
86.40
08:50:01 AM all 10.72 0.02 0.97 0.73 0.00
87.56
09:00:01 AM all 10.93 0.02 1.71 1.16 0.00
86.18
09:10:01 AM all 14.54 0.02 1.99 1.94 0.00
81.52
09:20:01 AM all 14.60 0.02 2.07 1.82 0.00
81.50
Average: all 2.10 0.02 0.63 0.21 0.00
97.04

There has now been various sync jobs running for nearly 24 hours, the last
one is the below

Id: cbda7f78-dd70-48f6-ae09-67f29b0a5746
Label: Actions::Katello::Repository::CapsuleGenerateAndSync
Name: Sync Repository on Capsule(s)
Owner: foreman_admin
Execution type: Delayed
Start at: 2016-07-27 20:39:57 UTC
Start before: -
Started at: 2016-07-27 20:39:57 UTC
Ended at:
State: running
Result: -
Params: {"services_checked"=>["pulp", "pulp_auth"], "locale"=>"en"}

I'm upped the CPU count on the capsule and added some extra RAM, will see
how it goes…

Dylan

··· On Thursday, July 28, 2016 at 3:24:52 AM UTC+12, Eric Helms wrote: > > Dylan, > > Are you able to determine the network traffic between the server and > capsule? In general I think there are two possibilities, slow transfer of > data to the Capsule or slowness on the Capsule itself. I would recommend > checking the network speeds and the Capsule resources to see if its > constrained I/O wise or processes are eating the CPU. > > > Eric > > On Tue, Jul 26, 2016 at 10:52 PM, Dylan Baars > wrote: > >> Hi all, >> >> we have a katello 3.0.1 install with a katello capsule in our DMZ. >> >> we've noticed if we publish/promote content, the main katello server does >> this reasonably quickly, but tasks (to sync content) on the capsule are >> taking hours to complete to match the change on the main server. Does >> anyone know why this is? Can it be improved? >> >> Thanks! >> Dylan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Foreman users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-user...@googlegroups.com . >> To post to this group, send email to forema...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/foreman-users. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Eric D. Helms > Red Hat Engineering > Ph.D. Student - North Carolina State University >

Hi again,

I've noticed in the dynflow console that katello seems to be waiting for
pulp to start the task -

12: Actions::Pulp::Consumer::SyncCapsule (waiting for Pulp to start the
task) [ 201571.83s / 2023.95s ] Cancel
<https://192.168.59.247/foreman_tasks/dynflow/48f70f23-1f1f-4e9f-8ca2-05db9e0da521/cancel/12>

Started at: 2016-08-01 00:06:17 UTC

Ended at: 2016-08-03 08:05:48 UTC

Real time: 201571.83s

Execution time (excluding suspended state): 2023.95s

Input:

··· --- capsule_id: 2 repo_pulp_id: NIWA-Oracle_Virtualbox-Virtualbox_rhel7_x86_64 remote_user: admin remote_cp_user: admin locale: en

Output:


pulp_tasks:- exception:
task_type: pulp.server.managers.repo.sync.sync
_href: "/pulp/api/v2/tasks/20254460-c490-4669-bf02-bbcf0161172d/"
task_id: 20254460-c490-4669-bf02-bbcf0161172d
tags:

  • pulp:repository:NIWA-Oracle_Virtualbox-Virtualbox_rhel7_x86_64
  • pulp:action:sync
    finish_time:
    _ns: task_status
    start_time:
    traceback:
    spawned_tasks: []
    progress_report: {}
    queue: None.dq
    state: waiting
    worker_name:
    result:
    error:
    _id:
    "$oid": 579e9279b9867873daf31b4f
    id: 579e9279b9867873daf31b4f
    poll_attempts:
    total: 12369
    failed: 0

Does this help anyone try and figure out whats going on?

Thanks :slight_smile:
Dylan

Dylan,

I wonder if things are a little stopped up in pulp/celery on the capsule.
Run this on the capsule:

systemctl stop pulp_celerybeat.service pulp_workers.service
pulp_resource_manager.service
systemctl start pulp_celerybeat.service pulp_workers.service
pulp_resource_manager.service
systemctl restart httpd

Sometimes this can help clear stuck pulp jobs. Not sure if that is your
issue, but worth a try :slight_smile:

Thanks,

John Mitsch
Red Hat Engineering
(860)-967-7285
irc: jomitsch

··· On Wed, Aug 3, 2016 at 3:41 PM, Dylan Baars wrote:

Hi again,

I’ve noticed in the dynflow console that katello seems to be waiting for
pulp to start the task -

12: Actions::Pulp::Consumer::SyncCapsule (waiting for Pulp to start the
task) [ 201571.83s / 2023.95s ] Cancel
https://192.168.59.247/foreman_tasks/dynflow/48f70f23-1f1f-4e9f-8ca2-05db9e0da521/cancel/12

Started at: 2016-08-01 00:06:17 UTC

Ended at: 2016-08-03 08:05:48 UTC

Real time: 201571.83s

Execution time (excluding suspended state): 2023.95s

Input:


capsule_id: 2
repo_pulp_id: NIWA-Oracle_Virtualbox-Virtualbox_rhel7_x86_64
remote_user: admin
remote_cp_user: admin
locale: en

Output:


pulp_tasks:- exception:
task_type: pulp.server.managers.repo.sync.sync
_href: "/pulp/api/v2/tasks/20254460-c490-4669-bf02-bbcf0161172d/"
task_id: 20254460-c490-4669-bf02-bbcf0161172d
tags:

  • pulp:repository:NIWA-Oracle_Virtualbox-Virtualbox_rhel7_x86_64
  • pulp:action:sync
    finish_time:
    _ns: task_status
    start_time:
    traceback:
    spawned_tasks: []
    progress_report: {}
    queue: None.dq
    state: waiting
    worker_name:
    result:
    error:
    _id:
    "$oid": 579e9279b9867873daf31b4f
    id: 579e9279b9867873daf31b4f
    poll_attempts:
    total: 12369
    failed: 0

Does this help anyone try and figure out whats going on?

Thanks :slight_smile:
Dylan


You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

I did that, didn't seem to help. I've now done a few things:

  1. Removed all lifecycle environments from the capsule (via the katello GUI)
  2. Used foreman-console to delete the 2 running tasks that had been stuck
    at 62% for days
    ForemanTasks::Task.find("7808217a-4057-4b43-a741-0c20a6c41307").destroy
    ForemanTasks::Task.find("ff0e613c-9160-49d6-bbc0-b8fcbb0f583a").destroy
    This then put the capsule in a "synchronised" state - i.e. no running tasks
    according to katello
  3. Added a single lifecycle environment to the capsule
  4. Hit Synchronise
  5. It ran up to 48% super quickly, and now seems stuck

If I browse to the running task > running steps, there are a whole bunch of
waiting actions "Actions::Pulp::Repository::Destroy" - state is State: waiting
for Pulp to start the task

Is this a communication problem? I have the following ports open on the
katello server
443/tcp 80/tcp 8140/tcp 9090/tcp 5647/tcp 5646/tcp 5672/tcp 8080/tcp 67/udp
9000/tcp 68/udp 53/tcp 69/udp 4505-4506/tcp 53/udp 5671/tcp

and on the capsule:
443/tcp 80/tcp 8140/tcp 9090/tcp 5647/tcp 9000/tcp 8443/tcp 8000/tcp 69/udp
4505-4506/tcp

There is a physical firewall between the two servers, these ports are open
on that for the two servers to communicate
HTTP (80/tcp)
HTTPS (443/tcp)
9090/tcp
4505-4506/tcp
5646/tcp
5647/tcp

··· On Thursday, August 4, 2016 at 7:46:53 AM UTC+12, John Mitsch wrote: > > Dylan, > > I wonder if things are a little stopped up in pulp/celery on the capsule. > Run this on the capsule: > > systemctl stop pulp_celerybeat.service pulp_workers.service > pulp_resource_manager.service > systemctl start pulp_celerybeat.service pulp_workers.service > pulp_resource_manager.service > systemctl restart httpd > > Sometimes this can help clear stuck pulp jobs. Not sure if that is your > issue, but worth a try :) > > Thanks, > > John Mitsch > Red Hat Engineering > (860)-967-7285 > irc: jomitsch > > On Wed, Aug 3, 2016 at 3:41 PM, Dylan Baars > wrote: > >> Hi again, >> >> I've noticed in the dynflow console that katello seems to be waiting for >> pulp to start the task - >> >> 12: Actions::Pulp::Consumer::SyncCapsule (waiting for Pulp to start the >> task) [ 201571.83s / 2023.95s ] Cancel >> >> >> *Started at:* 2016-08-01 00:06:17 UTC >> >> *Ended at:* 2016-08-03 08:05:48 UTC >> >> *Real time:* 201571.83s >> >> *Execution time (excluding suspended state):* 2023.95s >> >> *Input:* >> >> --- >> capsule_id: 2 >> repo_pulp_id: NIWA-Oracle_Virtualbox-Virtualbox_rhel7_x86_64 >> remote_user: admin >> remote_cp_user: admin >> locale: en >> >> *Output:* >> >> --- >> pulp_tasks:- exception: >> task_type: pulp.server.managers.repo.sync.sync >> _href: "/pulp/api/v2/tasks/20254460-c490-4669-bf02-bbcf0161172d/" >> task_id: 20254460-c490-4669-bf02-bbcf0161172d >> tags: >> - pulp:repository:NIWA-Oracle_Virtualbox-Virtualbox_rhel7_x86_64 >> - pulp:action:sync >> finish_time: >> _ns: task_status >> start_time: >> traceback: >> spawned_tasks: [] >> progress_report: {} >> queue: None.dq >> state: waiting >> worker_name: >> result: >> error: >> _id: >> "$oid": 579e9279b9867873daf31b4f >> id: 579e9279b9867873daf31b4f >> poll_attempts: >> total: 12369 >> failed: 0 >> >> Does this help anyone try and figure out whats going on? >> >> Thanks :-) >> Dylan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Foreman users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-user...@googlegroups.com . >> To post to this group, send email to forema...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/foreman-users. >> For more options, visit https://groups.google.com/d/optout. >> > >

I took a look at https://access.redhat.com/solutions/2044393 and checked
the queue depth on the capsule

[root@wellcapsuleext ~]# qpid-stat
–ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b
amqps://localhost:5671 -q resource_manager
Properties:
Name Durable AutoDelete Exclusive FlowStopped
FlowStoppedCount Consumers Bindings

··· ====================================================================================================== resource_manager Y N N N 0 1 2

Optional Properties:
Property Value

============================================================================
arguments {u’passive’: False, u’exclusive’: False, u’arguments’: None}
alt-exchange

Statistics:
Statistic Messages Bytes

queue-depth 388 637,944
total-enqueues 413 700,369
total-dequeues 25 62,425
persistent-enqueues 0 0
persistent-dequeues 0 0
transactional-enqueues 0 0
transactional-dequeues 0 0
flow-to-disk-depth 0 0
flow-to-disk-enqueues 0 0
flow-to-disk-dequeues 0 0
acquires 844
releases 431
discards-ttl-expired 0
discards-limit-overflow 0
discards-ring-overflow 0
discards-lvq-replace 0
discards-subscriber-reject 0
discards-purged 0
reroutes 0
[root@wellcapsuleext ~]#

checking over the course of a few minutes showed no change, so I ran the
workaround…

[root@wellcapsuleext ~]# for i in pulp_resource_manager pulp_workers
pulp_celerybeat; do service $i restart; done
Redirecting to /bin/systemctl restart pulp_resource_manager.service
Redirecting to /bin/systemctl restart pulp_workers.service
Redirecting to /bin/systemctl restart pulp_celerybeat.service
[root@wellcapsuleext ~]#

the de-queues changed slightly, and the acquires/releases went up but only
to a point, they are now stuck here…

[root@wellcapsuleext ~]# qpid-stat
–ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b
amqps://localhost:5671 -q resource_manager
Properties:
Name Durable AutoDelete Exclusive FlowStopped
FlowStoppedCount Consumers Bindings

======================================================================================================
resource_manager Y N N N 0
1 2

Optional Properties:
Property Value

============================================================================
arguments {u’passive’: False, u’exclusive’: False, u’arguments’: None}
alt-exchange

Statistics:
Statistic Messages Bytes

queue-depth 381 628,577
total-enqueues 417 707,065
total-dequeues 36 78,488
persistent-enqueues 0 0
persistent-dequeues 0 0
transactional-enqueues 0 0
transactional-dequeues 0 0
flow-to-disk-depth 0 0
flow-to-disk-enqueues 0 0
flow-to-disk-dequeues 0 0
acquires 1,235
releases 818
discards-ttl-expired 0
discards-limit-overflow 0
discards-ring-overflow 0
discards-lvq-replace 0
discards-subscriber-reject 0
discards-purged 0
reroutes 0
[root@wellcapsuleext ~]#

Thanks for your help!

Well, after a few hours it did actually finish the sync - which involved
removing the lifecycle environments I had deleted) and syncing a single
lifecycle

It does seem to sit for a very long time just waiting for pulp - any idea
why?

Further, the capsule syncs seem to take swap space (even if RAM is/was
available) and never gives it back - to the point where we were publishing
content views and tomcat failed, in /var/log/foreman/production.log start
getting…

2016-08-04 14:20:15 [app] [I] Imported report for niwa-35649.niwa.local in
0.12 seconds
2016-08-04 14:20:15 [app] [W] Action failed
> Katello::Errors::ConnectionRefusedException: A backend service [
Candlepin ] is unreachable
>
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/http_resource.rb:90:in
rescue in get&#39; &gt; /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/http_resource.rb:80:inget'
>
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/resources/candlepin.rb:100:in
get&#39; &gt; /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/services/katello/candlepin/consumer.rb:17:inblock in <class:Consumer>'
>
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:160:in
instance_eval&#39; &gt; /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:160:inrun_initializer'
>
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:139:in
lazy_attribute_get&#39; &gt; /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:60:inblock (2 levels) in lazy_accessor'
>
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/services/katello/candlepin/consumer.rb:38:in
entitlement_status&#39; &gt; /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/models/katello/subscription_status.rb:40:into_status'
> /usr/share/foreman/app/models/host_status/status.rb:64:in update_status&#39; &gt; /usr/share/foreman/app/models/host_status/status.rb:48:inrefresh'
> /usr/share/foreman/app/models/host_status/status.rb:42:in refresh!&#39; &gt; /usr/share/foreman/app/models/host/managed.rb:858:inblock in
refresh_statuses'
> /opt/rh/rh-ruby22/root/usr/share/ruby/set.rb:283:in each_key&#39; &gt; /opt/rh/rh-ruby22/root/usr/share/ruby/set.rb:283:ineach'

and in /var/log/messages

Aug 4 14:20:07 wellkatello.niwa.local qdrouterd: Thu Aug 4 14:20:07 2016
ROUTER_LS (info) Router Link Lost - link_id=0
Aug 4 14:20:12 wellkatello.niwa.local kernel: diagnostic_con* invoked
oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
Aug 4 14:20:12 wellkatello.niwa.local kernel: diagnostic_con* cpuset=/
mems_allowed=0-1
Aug 4 14:20:12 wellkatello.niwa.local kernel: CPU: 1 PID: 7900 Comm:
diagnostic_con* Not tainted 3.10.0-327.22.2.el7.x86_64 #1
Aug 4 14:20:12 wellkatello.niwa.local kernel: Hardware name: VMware, Inc.
VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
04/14/2014
Aug 4 14:20:12 wellkatello.niwa.local kernel: ffff880415850b80
00000000555a1641 ffff880004e3fb68 ffffffff816360fc
Aug 4 14:20:12 wellkatello.niwa.local kernel: ffff880004e3fbf8
ffffffff8163109c ffff88041ac8fc20 ffff88041ac8fc38
Aug 4 14:20:12 wellkatello.niwa.local kernel: 018804fb00000206
fffeffff00000000 0000000000000001 ffffffff81128c03
Aug 4 14:20:12 wellkatello.niwa.local kernel: Call Trace:
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff816360fc>]
dump_stack+0x19/0x1b
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff8163109c>]
dump_header+0x8e/0x214
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff81128c03>] ?
delayacct_end+0x13/0xb0
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff8116d20e>]
oom_kill_process+0x24e/0x3b0
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff81088dae>] ?
has_capability_noaudit+0x1e/0x30
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff8116da36>]
out_of_memory+0x4b6/0x4f0
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff81173c26>]
__alloc_pages_nodemask+0xaa6/0xba0
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff81078d73>]
copy_process.part.25+0x163/0x1610
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff8107a401>]
do_fork+0xe1/0x320
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff8107a6c6>]
SyS_clone+0x16/0x20
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff81646bd9>]
stub_clone+0x69/0x90
Aug 4 14:20:12 wellkatello.niwa.local kernel: [<ffffffff81646889>] ?
system_call_fastpath+0x16/0x1b
Aug 4 14:20:12 wellkatello.niwa.local kernel: Mem-Info:
Aug 4 14:20:12 wellkatello.niwa.local kernel: Node 0 DMA per-cpu:


Aug 4 14:20:12 wellkatello.niwa.local kernel: Out of memory: Kill process
23388 (java) score 50 or sacrifice child
Aug 4 14:20:12 wellkatello.niwa.local kernel: Killed process 23388 (java)
total-vm:9879424kB, anon-rss:856892kB, file-rss:0kB
Aug 4 14:20:13 wellkatello.niwa.local systemd: tomcat.service: main
process exited, code=killed, status=9/KILL
Aug 4 14:20:13 wellkatello.niwa.local puppet-master[32492]: Compiled
catalog for niwa-35517.niwa.local in environment production in 0.03 seconds
Aug 4 14:20:15 wellkatello.niwa.local server: Java virtual machine used:
/usr/lib/jvm/jre/bin/java
Aug 4 14:20:15 wellkatello.niwa.local server: classpath used:
/usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
Aug 4 14:20:15 wellkatello.niwa.local server: main class used:
org.apache.catalina.startup.Bootstrap
Aug 4 14:20:15 wellkatello.niwa.local server: flags used:
Aug 4 14:20:15 wellkatello.niwa.local server: options used:
-Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat
-Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp
-Djava.util.logging.config.file=/usr/share/to$
Aug 4 14:20:15 wellkatello.niwa.local server: arguments used: stop
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.strategy:INFO:
Received task:
pulp.server.async.tasks._queue_reserved_task[8a0acee7-a3ae-4eba-9b45-f5edb974d71e]
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.strategy:INFO:
Received task:
pulp.server.managers.consumer.applicability.regenerate_applicability_for_consumers[a4f48860-5627-4931-abfa-9556309ce343]
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.strategy:INFO:
Received task:
pulp.server.async.tasks._release_resource[26b8b00d-f769-4452-920f-cf350d5fafd7]
Aug 4 14:20:24 wellkatello.niwa.local server: Aug 04, 2016 2:20:24 PM
org.apache.catalina.startup.Catalina stopServer
Aug 4 14:20:24 wellkatello.niwa.local server: SEVERE: Could not contact
localhost:8005. Tomcat may not be running.
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.job:INFO: Task
pulp.server.async.tasks._queue_reserved_task[8a0acee7-a3ae-4eba-9b45-f5edb974d71e]
succeeded in 0.318352089002s: None
Aug 4 14:20:24 wellkatello.niwa.local server: Aug 04, 2016 2:20:24 PM
org.apache.catalina.startup.Catalina stopServer
Aug 4 14:20:24 wellkatello.niwa.local server: SEVERE: Catalina.stop:

A reboot clears things up.

I'm not sure what to provide to try and troubleshoot this issue :frowning:

Dylan,

I would be curious to see a foreman-debug report of this. Is there anyway
you could prepare one and send it to myself and Eric?

-John

John Mitsch
Red Hat Engineering
(860)-967-7285
irc: jomitsch

··· On Wed, Aug 3, 2016 at 11:09 PM, Dylan Baars wrote:

Further, the capsule syncs seem to take swap space (even if RAM is/was
available) and never gives it back - to the point where we were publishing
content views and tomcat failed, in /var/log/foreman/production.log start
getting…

2016-08-04 14:20:15 [app] [I] Imported report for niwa-35649.niwa.local in
0.12 seconds
2016-08-04 14:20:15 [app] [W] Action failed

Katello::Errors::ConnectionRefusedException: A backend service [
Candlepin ] is unreachable

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/http_resource.rb:90:in
`rescue in get’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/http_resource.rb:80:in
`get’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/resources/candlepin.rb:100:in
`get’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/services/katello/candlepin/consumer.rb:17:in
`block in class:Consumer

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:160:in
`instance_eval’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:160:in
`run_initializer’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:139:in
`lazy_attribute_get’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/lib/katello/lazy_accessor.rb:60:in
`block (2 levels) in lazy_accessor’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/services/katello/candlepin/consumer.rb:38:in
`entitlement_status’

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.2/app/models/katello/subscription_status.rb:40:in
`to_status’

/usr/share/foreman/app/models/host_status/status.rb:64:in
update_status' /usr/share/foreman/app/models/host_status/status.rb:48:inrefresh’
/usr/share/foreman/app/models/host_status/status.rb:42:in refresh!' /usr/share/foreman/app/models/host/managed.rb:858:inblock in
refresh_statuses’
/opt/rh/rh-ruby22/root/usr/share/ruby/set.rb:283:in each_key' /opt/rh/rh-ruby22/root/usr/share/ruby/set.rb:283:ineach’

and in /var/log/messages

Aug 4 14:20:07 wellkatello.niwa.local qdrouterd: Thu Aug 4 14:20:07 2016
ROUTER_LS (info) Router Link Lost - link_id=0
Aug 4 14:20:12 wellkatello.niwa.local kernel: diagnostic_con* invoked
oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
Aug 4 14:20:12 wellkatello.niwa.local kernel: diagnostic_con* cpuset=/
mems_allowed=0-1
Aug 4 14:20:12 wellkatello.niwa.local kernel: CPU: 1 PID: 7900 Comm:
diagnostic_con* Not tainted 3.10.0-327.22.2.el7.x86_64 #1
Aug 4 14:20:12 wellkatello.niwa.local kernel: Hardware name: VMware, Inc.
VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
04/14/2014
Aug 4 14:20:12 wellkatello.niwa.local kernel: ffff880415850b80
00000000555a1641 ffff880004e3fb68 ffffffff816360fc
Aug 4 14:20:12 wellkatello.niwa.local kernel: ffff880004e3fbf8
ffffffff8163109c ffff88041ac8fc20 ffff88041ac8fc38
Aug 4 14:20:12 wellkatello.niwa.local kernel: 018804fb00000206
fffeffff00000000 0000000000000001 ffffffff81128c03
Aug 4 14:20:12 wellkatello.niwa.local kernel: Call Trace:
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
dump_stack+0x19/0x1b
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
dump_header+0x8e/0x214
Aug 4 14:20:12 wellkatello.niwa.local kernel: [] ?
delayacct_end+0x13/0xb0
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
oom_kill_process+0x24e/0x3b0
Aug 4 14:20:12 wellkatello.niwa.local kernel: [] ?
has_capability_noaudit+0x1e/0x30
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
out_of_memory+0x4b6/0x4f0
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
__alloc_pages_nodemask+0xaa6/0xba0
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
copy_process.part.25+0x163/0x1610
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
do_fork+0xe1/0x320
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
SyS_clone+0x16/0x20
Aug 4 14:20:12 wellkatello.niwa.local kernel: []
stub_clone+0x69/0x90
Aug 4 14:20:12 wellkatello.niwa.local kernel: [] ?
system_call_fastpath+0x16/0x1b
Aug 4 14:20:12 wellkatello.niwa.local kernel: Mem-Info:
Aug 4 14:20:12 wellkatello.niwa.local kernel: Node 0 DMA per-cpu:


Aug 4 14:20:12 wellkatello.niwa.local kernel: Out of memory: Kill process
23388 (java) score 50 or sacrifice child
Aug 4 14:20:12 wellkatello.niwa.local kernel: Killed process 23388 (java)
total-vm:9879424kB, anon-rss:856892kB, file-rss:0kB
Aug 4 14:20:13 wellkatello.niwa.local systemd: tomcat.service: main
process exited, code=killed, status=9/KILL
Aug 4 14:20:13 wellkatello.niwa.local puppet-master[32492]: Compiled
catalog for niwa-35517.niwa.local in environment production in 0.03 seconds
Aug 4 14:20:15 wellkatello.niwa.local server: Java virtual machine used:
/usr/lib/jvm/jre/bin/java
Aug 4 14:20:15 wellkatello.niwa.local server: classpath used:
/usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
Aug 4 14:20:15 wellkatello.niwa.local server: main class used:
org.apache.catalina.startup.Bootstrap
Aug 4 14:20:15 wellkatello.niwa.local server: flags used:
Aug 4 14:20:15 wellkatello.niwa.local server: options used:
-Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat
-Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp
-Djava.util.logging.config.file=/usr/share/to$
Aug 4 14:20:15 wellkatello.niwa.local server: arguments used: stop
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.strategy:INFO:
Received task:
pulp.server.async.tasks._queue_reserved_task[8a0acee7-a3ae-4eba-9b45-f5edb974d71e]
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.strategy:INFO:
Received task:
pulp.server.managers.consumer.applicability.regenerate_applicability_for_consumers[a4f48860-5627-4931-abfa-9556309ce343]
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.strategy:INFO:
Received task:
pulp.server.async.tasks._release_resource[26b8b00d-f769-4452-920f-cf350d5fafd7]
Aug 4 14:20:24 wellkatello.niwa.local server: Aug 04, 2016 2:20:24 PM
org.apache.catalina.startup.Catalina stopServer
Aug 4 14:20:24 wellkatello.niwa.local server: SEVERE: Could not contact
localhost:8005. Tomcat may not be running.
Aug 4 14:20:24 wellkatello.niwa.local pulp: celery.worker.job:INFO: Task
pulp.server.async.tasks._queue_reserved_task[8a0acee7-a3ae-4eba-9b45-f5edb974d71e]
succeeded in 0.318352089002s: None
Aug 4 14:20:24 wellkatello.niwa.local server: Aug 04, 2016 2:20:24 PM
org.apache.catalina.startup.Catalina stopServer
Aug 4 14:20:24 wellkatello.niwa.local server: SEVERE: Catalina.stop:

A reboot clears things up.

I’m not sure what to provide to try and troubleshoot this issue :frowning:


You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.