"Bind entitlements to an allocation" task stalled/suspended

Problem:
After importing a subscription manifest, Foreman started a task to “Bind entitlements to an allocation”. However, that process has not completed yet and has been running for over 21 hours. This task seems to have caused my sync task of the Red Hat repos to fail over night as well.

Expected outcome:
Timely subscription import.

Foreman and Proxy versions:
2.5.4
Foreman and Proxy plugin versions:
candlepin-4.0.9-1.el7.noarch
katello-4.1.4-1.el7.noarch
python3-pulpcore-3.14.9-1.el7.noarch

Distribution and version:
CentOS Linux release 7.9.2009 (Core)

Other relevant data:
Dynflow output for current task:

10: Actions::Candlepin::Owner::Import (waiting for Candlepin to finish the task) [ 78410.79s / 524.08s ]
Queue: default
Started at: 2022-05-16 15:51:26 UTC
Ended at: 2022-05-17 13:38:16 UTC
Real time: 78410.79s
Execution time (excluding suspended state): 524.08s
Input:

---
label: <ORGANIZATION_NAME>
path: "/tmp/0.6979887199982576.zip"
dependency:
  response: true
session_id: bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3
remote_user: admin
remote_cp_user: admin
current_request_id: bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3
current_timezone: America/New_York
current_organization_id: 4
current_location_id: 2
current_user_id: 8

Output:

---
task:
  created: '2022-05-16T15:51:26+0000'
  updated: '2022-05-16T15:51:26+0000'
  id: '003729a77f09c5660180cd9133772463'
  name: Import Manifest
  group: 
  origin: <HOSTNAME>
  executor: 
  principal: foreman_admin
  state: CREATED
  previousState: CREATED
  startTime: 
  endTime: 
  attempts: 0
  maxAttempts: 1
  statusPath: "/jobs/003729a77f09c5660180cd9133772463"
  resultData: 
  key: ImportJob
poll_attempts:
  total: 4868
  failed: 0

As far as I can tell, there is no cancel or skip options associated with this task. There are no errors listed in Dynflow nor under the Errors tab in the “foreman_tasks/tasks/” url for this task. Under the Running Steps tab, I have the following:

Action:
Actions::Candlepin::Owner::Import
State:suspended
Input:
{"label"=>"<ORGANIZATION_NAME>",
 "path"=>"/tmp/0.6979887199982576.zip",
 "dependency"=>{"response"=>true},
 "session_id"=>"bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3",
 "remote_user"=>"admin",
 "remote_cp_user"=>"admin",
 "current_request_id"=>"bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3",
 "current_timezone"=>"America/New_York",
 "current_organization_id"=>4,
 "current_location_id"=>2,
 "current_user_id"=>8}

Output:

{"task"=>
  {"created"=>"2022-05-16T15:51:26+0000",
   "updated"=>"2022-05-16T15:51:26+0000",
   "id"=>"003729a77f09c5660180cd9133772463",
   "name"=>"Import Manifest",
   "group"=>nil,
   "origin"=>"<HOSTNAME>",
   "executor"=>nil,
   "principal"=>"foreman_admin",
   "state"=>"CREATED",
   "previousState"=>"CREATED",
   "startTime"=>nil,
   "endTime"=>nil,
   "attempts"=>0,
   "maxAttempts"=>1,
   "statusPath"=>"/jobs/003729a77f09c5660180cd9133772463",
   "resultData"=>nil,
   "key"=>"ImportJob"},
 "poll_attempts"=>{"total"=>4918, "failed"=>0}}

From the GUI, it seems my only option is to Force Cancel, but that got me into trouble in the past, so I haven’t done that yet. Is there a way to nudge this process forward?

Thank you!

Seems Candlepin never finished the manifest import for some reason.
Please check foreman-maintain service status for tomcat.
Also check /var/log/candlepin/candlepin.log and error.log for any related errors.
Thanks.

1 Like

Regarding the foreman-maintain service status Tomcat output, I don’t see any errors:

/ displaying tomcat
● tomcat.service - Apache Tomcat Web Application Container
   Loaded: loaded (/usr/lib/systemd/system/tomcat.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-02-17 17:19:31 EST; 2 months 27 days ago
 Main PID: 1755 (java)
    Tasks: 108
   CGroup: /system.slice/tomcat.service
           └─1755 /usr/lib/jvm/jre-11/bin/java -Xms1024m -Xmx4096m -Djava.security.auth.login.config=/usr/share/tomcat/conf/login.config -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

The only questionable thing I see in that output is related to PostgreSQL:

Warning: rh-postgresql12-postgresql.service changed on disk. Run 'systemctl daemon-reload' to reload units.

At the end of that output, it shows:

/ All services displayed
                                                                                
/ All services are running                                            e[32me[1m[OK]e[0m

The error.log doesn’t have any entries from the 16th when I initiated the import. Here are the lines from the candlepin.log file when I initiated the import:

2022-05-16 11:51:26,204 [thread=http-bio-127.0.0.1-23443-exec-91] [req=1be79a43-f459-4c96-bc87-9a20ab928547, org=, csid=bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/owners/<ORGANIZATION>/imports/async
2022-05-16 11:51:26,301 [thread=http-bio-127.0.0.1-23443-exec-91] [req=1be79a43-f459-4c96-bc87-9a20ab928547, org=<ORGANIZATION>, csid=bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3] INFO  org.candlepin.resource.OwnerResource - Running async import of archive /var/cache/tomcat/temp/pfx10562967600964487939sfx for owner <ORGANIZATION>
2022-05-16 11:51:26,865 [thread=http-bio-127.0.0.1-23443-exec-91] [req=1be79a43-f459-4c96-bc87-9a20ab928547, org=<ORGANIZATION>, csid=bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3] INFO  org.candlepin.async.JobManager - Job queued: AsyncJobStatus [id: 003729a77f09c5660180cd9133772463, name: Import Manifest, key: ImportJob, state: QUEUED]

After that, there seems to just be a bunch of get requests like this:

2022-05-16 11:51:34,054 [thread=http-bio-127.0.0.1-23443-exec-84] [req=93b96e30-bba7-401e-b2b8-dd53060e9b1e, org=, csid=bd103c77-14a8-43c6-9f7d-8cbafbc6e4f3] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/jobs/003729a77f09c5660180cd9133772463?

Foreman continues to happily wait for the process to finish. The progress bar is shown on the Subscriptions page:
Screen Shot 2022-05-18 at 10.08.53 AM
This actually blocks all actions on that page so, for example, I can’t view the other defined subscriptions, etc. Since initiating this import, I am also unable to sync the RHEL repos. Any attempt to so throws the following error:

description: 403, message='Forbidden', url=URL('https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os')
  worker: "/pulp/api/v3/workers/1be4e556-a6f4-4402-8dee-f62f81fb518c/"

Just some further information in case it’s useful. I’m not familiar with using Candlepin directly so I’m not sure what may or may not be relevant/helpful.

Hi @cbcbcb ,

Here is some questions to help us understand your situation.
Were you able to import any manifest before?
Is there any change to your system recently?
Does this manifest failed to import contain lots of entitlement certificates?
Do you see any errors in candlepin logs that didn’t make it back to dynflow for some reason?
Can you check postgres logs to see if it’s still trying to do anything?
Do you want to determine why it is stuck, or just cancel the task and move on?

Thanks.

Thanks for taking a look.

I did import a manifest in the past. I imported a RHEL Individual license for testing a while ago. In this case, I was trying to import a single standard RHEL license. The only thing I did differently this time is to select a single allocation rather than unlimited.

No, I don’t believe so. I was attempting to import a single RHEL license.

I’m not the main admin of the server. I have some access to it - like the foreman logs - but I am not involved in the day-to-day management. It doesn’t look like updates have been applied recently and the server hasn’t been rebooted recently.

I’ll have to work with the main admin for that. I don’t currently have access to those logs.

I’m happy to cancel the task and move on if that is safe. In the past, I’ve cancelled tasks and ended up with a bit of mess so I thought I’d stop by here first before possibly shooting myself in the foot. This manifest has to be imported though, so “moving on” here would be cancelling the task and attempting the import again. Maybe this issue is a one-off error? If it continues to fail, then I’d have to dig a bit deeper.

I forcefully cancelled the job from the main Tasks page - Monitor → Tasks. Even then, the progress bar was still shown in Content → Subscriptions. The result status for the task was listed still as pending. The system administrator then restarted the server. Upon coming back up, the manifest import task was stopped with a “warning” result. The error shown was:

AsyncJobStatus with id 003729a77f09c5660180cd9133772463 could not be found.

I can now interact with the Subscriptions page as normal.

I still cannot sync the RHEL repos. I assumed it was caused by the stuck import process, but maybe not. The error is the same as above:
Error: 403, message=‘Forbidden’, url=URL(‘https://cdn.redhat.com/content/dist/rhel8/8/x86_64/codeready-builder/os’)

I’ll continue to bang on this. But if anyone has insight into Candlepin, I’d appreciate it.

Attempted to reimport the subscription manifest that originally caused the issue. I got this error:

Owner has already imported from another subscription management application. The following conflicts were found: [ DISTRIBUTOR_CONFLICT ]

Which was true - I already had one subscription manifest imported and was trying to import another. Apparently, you’re not supposed to do this:

So, instead, I updated the existing subscription manifest (that I had imported successfully a few months ago) and re-imported it. This succeeded. I was then able to access the RHEL content on the Foreman server normally.