Failure to sync Fedora 32 repos - pulp yum_updater error command document too large

I’m curious if any of you have encountered this, or could offer a second opinion before I toss out a bug report.

Problem:
I’m attempting to sync Fedora 32 repositories from the Princeton mirror (I have tried another just to rule out the repo.) Unfortunately the sync fails with the exception Katello::Errors::PulpError: PLP0000: Importer indicated a failed response and the Error tab shows that the Pulp yum_importer failed with error command document too large.

I’ve found a handful of similar sounding bug reports, but they’re from a few years ago and claim to have fixed the issue. https://access.redhat.com/solutions/2957371

I have two reproduced this on two systems, Katello 3.15.3 and RH Satellite 6.7.2.

Expected outcome:
Repository to sync correctly as it did with Fedora 32.

Reproducible?:
If it’s not something with me, then you should just have to add the Fedora 32 Everything repo to a Product and attempt to sync. It’ll process metadata for a while and then error out.

Foreman and Proxy versions:
Foreman 2.0.1 w/ Katello 3.15.3
also produced the error on Satellite 6.7.2

Distribution and version:
Katello = CentOS 7.8
Satellite = RHEL 7.8

Other relevant data:
{“pulp_tasks”=>
[{“exception”=>nil,
“task_type”=>“pulp.server.managers.repo.sync.sync”,
“_href”=>"/pulp/api/v2/tasks/91caf2e6-6d0d-4dfb-abcd-d0e3879e40bb/",
“task_id”=>“91caf2e6-6d0d-4dfb-abcd-d0e3879e40bb”,
“tags”=>
[“pulp:repository:156f4383-2ae0-48cb-96ec-462eee4c4c22”,
“pulp:action:sync”],
“finish_time”=>“2020-08-10T20:39:33Z”,
ns"=>“task_status”,
“start_time”=>“2020-08-10T20:23:43Z”,
“traceback”=>
“Traceback (most recent call last):\n” +
" File “/usr/lib/python2.7/site-packages/celery/app/trace.py”, line 367, in trace_task\n" +
" R = retval = fun(*args, **kwargs)\n" +
" File “/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py”, line 686, in call\n" +
" return super(Task, self).call(*args, **kwargs)\n" +
" File “/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py”, line 108, in call\n" +
" return super(PulpTask, self).call(*args, **kwargs)\n" +
" File “/usr/lib/python2.7/site-packages/celery/app/trace.py”, line 622, in protected_call\n" +
" return self.run(*args, **kwargs)\n" +
" File “/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py”, line 855, in sync\n" +
" raise pulp_exceptions.PulpExecutionException(
(‘Importer indicated a failed response’))\n” +
“PulpExecutionException: Importer indicated a failed response\n”,
“spawned_tasks”=>,
“progress_report”=>
{“yum_importer”=>
{“content”=>
{“size_total”=>78681195936,
“items_left”=>11641,
“items_total”=>55327,
“state”=>“FAILED”,
“size_left”=>15786547076,
“details”=>
{“rpm_total”=>55327,
“rpm_done”=>43686,
“drpm_total”=>0,
“drpm_done”=>0},
“error”=>“command document too large”,
“error_details”=>},
“comps”=>{“state”=>“NOT_STARTED”},
“purge_duplicates”=>{“state”=>“NOT_STARTED”},
“distribution”=>
{“items_total”=>0,
“state”=>“NOT_STARTED”,
“error_details”=>,
“items_left”=>0},
“modules”=>{“state”=>“NOT_STARTED”},
“errata”=>{“state”=>“NOT_STARTED”},
“metadata”=>{“state”=>“FINISHED”}}},
“queue”=>“reserved_resource_worker-1@libkatello.library.unt.edu.dq2”,
“state”=>“error”,
“worker_name”=>“reserved_resource_worker-1@libkatello.library.unt.edu”,
“result”=>nil,
“error”=>
{“code”=>“PLP0000”,
“data”=>{},
“description”=>“Importer indicated a failed response”,
“sub_errors”=>},
“_id”=>{"$oid"=>“5f31accf9aae893f3610f6fd”},
“id”=>“5f31accf9aae893f3610f6fd”}],
“contents_changed”=>true,
“poll_attempts”=>{“total”=>80, “failed”=>1}}

Ack. I hate when you don’t see your typos until you submit. That should be Fedora 31. Everything works as advertised with the Fedora 31 repos.

@ShuffleBox

What is the size difference between the Fedora 31 repo and the 32 repo? Are you syncing with on_demand or immediate?

I had the Fedora 32 repo set up as on_demand. During testing I changed it to immediate to see if it would make a difference. It didn’t.
However, your comment prompted me to take a second look and I realized I setup my F30 and F31 repos as immediate. I decided to make a new F32 repo that was immediate from the start, and it feels like it’s making far more progress. If it syncs correctly, I should have something positive to report in an hour.
<time passed while sorting out answers to the questions about repo size …>
And the sync completed when the Download Policy is set to Immediate at the initial creation of the repo definition!

1 Like

That is awesome news on it finishing with immediate.

When you were getting the document too large error, did you notice anything odd in the journalctl from Mongo?

How large is the MongoDB database?

I had to remove the F32 repos in and recreate a new one setup with on_demand to get it to throw the error again. The existing one that was on_demand completed after a sync as well, and I assume it’s because the content from the repo set for immediate was present.

On this new repo sync, I watched ‘journalctl -f’ and did not see any errors or comments from mongodb, only pulp errors when it got to whatever the problem is.

Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064) command document too large
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064) Traceback (most recent call last):
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/sync.py", line 294, in run
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     self.update_content(metadata_files, url)
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/sync.py", line 607, in update_content
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     self.download_rpms(metadata_files, rpms_to_download, url)
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/sync.py", line 834, in download_rpms
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     unit = self.add_rpm_unit(metadata_files, unit)
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/sync.py", line 759, in add_rpm_unit
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     unit.save()
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib/python2.7/site-packages/mongoengine/document.py", line 324, in save
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     object_id = collection.save(doc, **write_concern)
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 2180, in save
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     check_keys, False, manipulate, write_concern)
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 709, in _update
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     codec_options=self.codec_options).copy()
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 216, in command
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     self._raise_connection_failure(error)
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)   File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 343, in _raise_connection_failure
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064)     raise error
Aug 11 15:20:04 host.fqdn pulp[6556]: pulp_rpm.plugins.importers.yum.sync:ERROR: [7181f260] (6556-80064) DocumentTooLarge: command document too large

The pulp_database in mongodb is showing to be 13.94GB.

I’m not really setup to tinker with it, but I guess the next step would involve debugging the Pulp yum importer plugin and see what it’s trying to hand off to pymongo?

@ShuffleBox

I will tag the @Pulp team here so they can comment.

In the mean time are you able to snapshot this machine if it is virtual or back it up if the machine is physical and do a repair on Mongo to see if we can get the size down and see if that helps as well with the document issue?

To run a repair we want to do the following:

# mongod --dbpath /var/lib/mongodb --repair

Sure thing!
Here’s what I did and how it went:

  • power down and snapshot VM

  • power up

  • hop into database to get before stats with #> mongo. Then grabbed the output from “show dbs” and “db.stats()”

  • Pre-repair = pulp_database 13.947GB

  • attempt to run “mongod --dbpath /var/lib/mongodb --repair” but get told the server is already running (whoops)

  • shut down services with #> foreman-maintain service stop

  • run “mongod --dbpath /var/lib/mongodb --repair” and it churns a bit

  • restart services with #> foreman-maintain service start

  • attempt to hop into mongo to check after stats with #> mongo, but receive a “connection was refused” error message.

  • check in the web interface Administer -> About, Backend System Stawtus shows that pulp & pulp_auth failed to start

  • stop services & reboot server (because that fixes everything)

  • System came up with the same failures

  • noted in journalctl, mongodb saying it didn’t have permission to open local.ns in /var/lib/mongodb

  • found some of the db files in /var/lib/mongodb/ were owned by root:root instead of mongodb:mongodb. took a guess and #>chmod mongodb:mongodb *

  • restarted foreman services and Pulp came up successfully!

  • Post-repair = pulp_database 11.948GB

  • removed existing Fedora 32 repo

  • recreated Fedora 32 repo with On-Demand Download Policy

  • started syncronization

  • received the same pulp_rpm plugin failure as before.

Did you or @Pulp have any further thoughts on this? If not, I can write up a reply of “There’s a Pulp bug. use sync immediate.” and mark that as solved, since the end result of having a working Fedora 3X repo is probably what most people searching would care about.

Also, since it seems like a bug, should I also drop it in Katello’s space on Redmine?

@ShuffleBox if the Pulp team does not answer here, I will reach out to them and have them follow up. I am sorry about the delay this got in my todo.

1 Like

@ShuffleBox, could you check the number of packages in your F32 repository after it synced successfully with the immediate policy? Do you have 55326 or 55327?

The gist of the problem is: one package (sagemath-doc-en-9.0-2.fc32.x86_64.rpm) has a lot of files ~360 000 and it doesn’t fit in the max size of the mongodb document.
I think that the immediate policy continues sync process and likely prints a log statement that there was a failure for that package, while on_demand policy fails in such case and stops the sync process.

I filed a Pulp upstream issue https://pulp.plan.io/issues/7449, so other users were aware of the problem with this repository.
So to workaround the sync failure, as you found out, you can sync with immediate policy. If you need that specific package, unfortunately, there is no workaround for that case.

3 Likes

@goosemania, the F32 repository does have 55326 packages and sagemath-doc-en is not among them. Thank you for looking into this and finding the specific cause!

For fun I may compare the F31 and F32 versions of that package and see if there’s anything obviously wrong to take to the SageMath team. Since it’s in the F31 repo and syncs successfully, perhaps it’s just a packaging error.