Content view import fails on many AlmaLinux repositories

Problem:

AlmaLinux 8 BaseOS, AppStream and PowerTools repositories fail in content view import on downstream Katello. On upstream server, sync, content view publish and content export works fine.

The import command is something like:

chown -R pulp:pulp /var/lib/pulp/imports/*
hammer content-import version --organization="My Org" --path=/var/lib/import/2022-07-08T06-41-04-03-00/

Some other repositories, like CentOS 8 BaseOS, AppStream and EPEL 8, imports fine.

Expected outcome:

Successful content view import on downstream Katello.

Foreman and Proxy versions:

Foreman 3.3
Katello 4.5

This is a fresh installation, not an upgrade.

Foreman and Proxy plugin versions:

Distribution and version:

AlmaLinux 8.6

Other relevant data:

/var/log/messages

Jul  8 06:44:57 foreman pulpcore-worker-2[207240]: pulp [4a2c9d0c-b695-4c5e-8b0a-a338cfdf07ac]: pulpcore.app.tasks.importer:INFO: ...8 import-errors encountered importing {fpath}, attempt {curr_attempt}, retrying
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]: pulp [4a2c9d0c-b695-4c5e-8b0a-a338cfdf07ac]: pulpcore.app.tasks.importer:ERROR: FATAL import-failure importing ./tmp6k_9ced7/repository-AlmaLinux_8_BaseOS-686750_1/pulp_rpm.app.modelresource.UpdateCollectionPackageResource.json
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]: pulp [4a2c9d0c-b695-4c5e-8b0a-a338cfdf07ac]: pulpcore.tasking.pulpcore_worker:INFO: Task c93c331d-88af-44ef-88ad-c485d8b024fc failed (get() returned more than one UpdateCollectionPackage -- it returned 2!)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]: pulp [4a2c9d0c-b695-4c5e-8b0a-a338cfdf07ac]: pulpcore.tasking.pulpcore_worker:INFO:   File "/usr/lib/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py", line 410, in _perform_task
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    result = func(*args, **kwargs)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py", line 232, in import_repository_version
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    for a_result in _import_file(os.path.join(rv_path, filename), res_class, retry=True):
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py", line 121, in _import_file
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    a_result = resource.import_data(data, raise_errors=True)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/import_export/resources.py", line 777, in import_data
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    return self.import_data_inner(
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/import_export/resources.py", line 829, in import_data_inner
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    raise row_result.errors[-1].error
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/import_export/resources.py", line 667, in import_row
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    instance, new = self.get_or_init_instance(instance_loader, row)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/import_export/resources.py", line 359, in get_or_init_instance
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    instance = self.get_instance(instance_loader, row)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/import_export/resources.py", line 352, in get_instance
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    return instance_loader.get_instance(row)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/import_export/instance_loaders.py", line 31, in get_instance
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    return self.get_queryset().get(**params)
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:  File "/usr/lib/python3.9/site-packages/django/db/models/query.py", line 439, in get
Jul  8 06:44:57 foreman pulpcore-worker-2[207240]:    raise self.model.MultipleObjectsReturned(

Same errors here, but here it also says “get() returned more than one UpdateCollectionPackage – it returned 2!”:

# curl https://`hostname`/pulp/api/v3/tasks/ae884e70-f6f2-4deb-9077-90ae7c25d5e6/ --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private

/pulp-client.key | json_reformat
{
    "pulp_href": "/pulp/api/v3/tasks/ae884e70-f6f2-4deb-9077-90ae7c25d5e6/",
    "pulp_created": "2022-07-08T04:53:52.530183Z",
    "state": "failed",
    "name": "pulpcore.app.tasks.importer.import_repository_version",
    "logging_cid": "b42b9160-cb38-43bc-829d-4add293f0f5a",
    "started_at": "2022-07-08T04:53:52.637896Z",
    "finished_at": "2022-07-08T04:57:19.130346Z",
    "error": {
        "traceback": "  File \"/usr/lib/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py\", line 410, in _perform_task\n    result = func(*args, **kwargs)\n  File \"/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py\", line 232, in import_repository_version\n    for a_result in _import_file(os.path.join(rv_path, filename), res_class, retry=True):\n  File \"/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py\", line 121, in _import_file\n    a_result = resource.import_data(data, raise_errors=True)\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 777, in import_data\n    return self.import_data_inner(\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 829, in import_data_inner\n    raise row_result.errors[-1].error\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 667, in import_row\n    instance, new = self.get_or_init_instance(instance_loader, row)\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 359, in get_or_init_instance\n    instance = self.get_instance(instance_loader, row)\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 352, in get_instance\n    return instance_loader.get_instance(row)\n  File \"/usr/lib/python3.9/site-packages/import_export/instance_loaders.py\", line 31, in get_instance\n    return self.get_queryset().get(**params)\n  File \"/usr/lib/python3.9/site-packages/django/db/models/query.py\", line 439, in get\n    raise self.model.MultipleObjectsReturned(\n",
        "description": "get() returned more than one UpdateCollectionPackage -- it returned 2!"
    },
    "worker": "/pulp/api/v3/workers/5162a23b-0ea6-43a8-b53f-22234d2a9190/",
    "parent_task": "/pulp/api/v3/tasks/d8148571-3467-4e4f-a9c9-01a9a59397e2/",
    "child_tasks": [

    ],
    "task_group": "/pulp/api/v3/task-groups/7d54c426-a33a-42ac-9b37-f57e05546457/",
    "progress_reports": [
        {
            "message": "Importing content for AlmaLinux_BaseOS_9-835432",
            "code": "import.repo.version.content",
            "state": "running",
            "total": null,
            "done": 0,
            "suffix": null
        }
    ],
    "created_resources": [

    ],
    "reserved_resources_record": [
        "/pulp/api/v3/repositories/rpm/rpm/bfc496ad-1391-409d-9c39-511f883fd1f9/"
    ]
}

Imports successfully if I create a very tight content view filter that only includes small amount of packages.

These are the failing repositories:
https://repo.almalinux.org/almalinux/8/BaseOS/x86_64/os/
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/
https://repo.almalinux.org/almalinux/8/PowerTools/x86_64/os/

Can someone take a look at this pulp issue?

I’ve filed https://github.com/pulp/pulp_rpm/issues/2648

Thank you for a quick reaction. If you need any more information, please ask me during this week - I will be mostly offline the next five weeks.

Here is a simple test case to reproduce the failure.

Upstream Katello

# Create upstream organization
hammer organization create --name "Upstream"

# Create product for AlmaLinux 8
hammer product create \
  --organization "Upstream" \
  --name "AlmaLinux 8"

# Create repository for AlmaLinux 8
hammer repository create \
  --organization "Upstream" \
  --product "AlmaLinux 8" \
  --name "AlmaLinux 8 BaseOS" \
  --content-type "yum" \
  --download-policy "immediate" \
  --mirroring-policy "mirror_content_only" \
  --url "https://repo.almalinux.org/almalinux/8/BaseOS/x86_64/os/" \
  --arch "x86_64" \
  --ignorable-content "srpm" \
  --mirror-on-sync "no"

# Create content view
hammer content-view create \
  --organization "Upstream" \
  --name "AlmaLinux 8"

# Add repository to content view
hammer content-view add-repository \
  --organization "Upstream" \
  --name "AlmaLinux 8" \
  --product "AlmaLinux 8" \
  --repository "AlmaLinux 8 BaseOS"

# Synchronize the repository
hammer repository synchronize \
  --organization "Upstream" \
  --product "AlmaLinux 8"
  --name "AlmaLinux 8 BaseOS"

# Publish new content view version
hammer content-view publish \
  --organization "Upstream" \
  --name "AlmaLinux 8"

# Export content view
hammer content-export complete version \
  --organization "Upstream" \
  --content-view "AlmaLinux 8" \
  --version 1

Downstream Katello

# Create downstream organization
hammer organization create --name "Downstream"

# Copy the exported content from upstream to downstream and do the import
chown -R pulp:pulp /var/lib/pulp/imports/*
hammer content-import version --organization="Downstream" --path=/var/lib/pulp/imports/2022-07-11T05-34-59-03-00/

The output of the above import command in my environment is:

[.............................................................................................................................] [100%]
Error: 1 subtask(s) failed for task group /pulp/api/v3/task-groups/ae01c7a9-a72b-4409-b7a4-5ce35fdc40af/.

The task result:

curl https://`hostname`/pulp/api/v3/task-groups/ae01c7a9-a72b-4409-b7a4-5ce35fdc40af/ --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key | json_reformat
{
    "pulp_href": "/pulp/api/v3/task-groups/ae01c7a9-a72b-4409-b7a4-5ce35fdc40af/",
    "description": "Import of None",
    "all_tasks_dispatched": true,
    "waiting": 0,
    "skipped": 0,
    "running": 0,
    "completed": 1,
    "canceled": 0,
    "failed": 1,
    "canceling": 0,
    "group_progress_reports": [
        {
            "message": "Importing repository versions",
            "code": "import.repo.versions",
            "total": 1,
            "done": 0,
            "suffix": null
        }
    ],
    "tasks": [
        {
            "pulp_href": "/pulp/api/v3/tasks/e6132d24-2222-4d1c-aea3-1c2b9411f4ab/",
            "pulp_created": "2022-07-11T02:56:09.349361Z",
            "name": "pulpcore.app.tasks.importer.pulp_import",
            "state": "completed",
            "started_at": "2022-07-11T02:56:09.537983Z",
            "finished_at": "2022-07-11T02:58:12.329416Z",
            "worker": "/pulp/api/v3/workers/0f533dd6-2a13-45a6-89db-79defee89780/"
        },
        {
            "pulp_href": "/pulp/api/v3/tasks/1edca81c-7010-4388-b989-02e035960cf6/",
            "pulp_created": "2022-07-11T02:58:11.666169Z",
            "name": "pulpcore.app.tasks.importer.import_repository_version",
            "state": "failed",
            "started_at": "2022-07-11T02:58:11.853416Z",
            "finished_at": "2022-07-11T03:03:55.538141Z",
            "worker": "/pulp/api/v3/workers/1b74d55f-15ab-4f79-ad7a-bf9384a26719/"
        }
    ]
}

And the failed subtask result:

curl https://`hostname`/pulp/api/v3/tasks/1edca81c-7010-4388-b989-02e035960cf6/ --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key | json_reformat
{
    "pulp_href": "/pulp/api/v3/tasks/1edca81c-7010-4388-b989-02e035960cf6/",
    "pulp_created": "2022-07-11T02:58:11.666169Z",
    "state": "failed",
    "name": "pulpcore.app.tasks.importer.import_repository_version",
    "logging_cid": "7b34246f-3c70-441f-a38d-4ba8af708c24",
    "started_at": "2022-07-11T02:58:11.853416Z",
    "finished_at": "2022-07-11T03:03:55.538141Z",
    "error": {
        "traceback": "  File \"/usr/lib/python3.9/site-packages/pulpcore/tasking/pulpcore_worker.py\", line 410, in _perform_task\n    result = func(*args, **kwargs)\n  File \"/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py\", line 232, in import_repository_version\n    for a_result in _import_file(os.path.join(rv_path, filename), res_class, retry=True):\n  File \"/usr/lib/python3.9/site-packages/pulpcore/app/tasks/importer.py\", line 121, in _import_file\n    a_result = resource.import_data(data, raise_errors=True)\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 777, in import_data\n    return self.import_data_inner(\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 829, in import_data_inner\n    raise row_result.errors[-1].error\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 667, in import_row\n    instance, new = self.get_or_init_instance(instance_loader, row)\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 359, in get_or_init_instance\n    instance = self.get_instance(instance_loader, row)\n  File \"/usr/lib/python3.9/site-packages/import_export/resources.py\", line 352, in get_instance\n    return instance_loader.get_instance(row)\n  File \"/usr/lib/python3.9/site-packages/import_export/instance_loaders.py\", line 31, in get_instance\n    return self.get_queryset().get(**params)\n  File \"/usr/lib/python3.9/site-packages/django/db/models/query.py\", line 439, in get\n    raise self.model.MultipleObjectsReturned(\n",
        "description": "get() returned more than one UpdateCollectionPackage -- it returned 2!"
    },
    "worker": "/pulp/api/v3/workers/1b74d55f-15ab-4f79-ad7a-bf9384a26719/",
    "parent_task": "/pulp/api/v3/tasks/e6132d24-2222-4d1c-aea3-1c2b9411f4ab/",
    "child_tasks": [

    ],
    "task_group": "/pulp/api/v3/task-groups/ae01c7a9-a72b-4409-b7a4-5ce35fdc40af/",
    "progress_reports": [
        {
            "message": "Importing content for AlmaLinux_8_BaseOS-976221",
            "code": "import.repo.version.content",
            "state": "running",
            "total": null,
            "done": 0,
            "suffix": null
        }
    ],
    "created_resources": [

    ],
    "reserved_resources_record": [
        "/pulp/api/v3/repositories/rpm/rpm/00914bd9-0f3a-43bc-a1ae-00a34eab3018/"
    ]
}

Fixed here https://github.com/pulp/pulp_rpm/issues/2648

Update the file /usr/lib/python3.9/site-packages/pulp_rpm/app/modelresource.py, restart the Foreman services and do a fresh export and import.

Thanks everyone.