"Error message: the server returns an error" during content view publish

Somehow I don’t quite understand the pulpcore database structure. I have tried to figure out if there are already duplicate package groups for the epel repository but somehow I am missing the link between the remote, the repository and the package group…

Hi @gvde,

If I’m reading this right, so the duplicate package group showed up after the first publish you did since the successful cleanup? Not after a repository sync?

We’ll have to get some more information from the Pulp team about how their package groups work. My thought is that content shouldn’t be disappearing, unless it’s like errata where updated units change uuids and the old ones get orphaned.

@dralley, the gist of this issue is that @gvde has certain package groups that are being duplicated and eventually destroyed somehow without any DELETE commands. The content view in question has filters, so content is being copied. I’m not sure if dependency solving is involved, but I’m going to guess not?

To be precise: the new uuid showed up the first time in the httpd log (i.e. it was accessed through httpd the first time) during the manual publish&promote after the cleanup. It was the first successful publish&promote. The publish before was that same morning in the cron job but that failed due to the other duplicates.

The creation date for the new package group in the pulpcore database it the day before 30th 18:48 UTC. The sync plan for the EPEL repositories is every 6 hours at hours 2,8,14,20:33 (UTC+2 currently).

Thus sync timing is:

30th 20:33 UTC+2, epel sync created 6d3f1b3b-29b2-4a5b-81a8-63a26c0170cc.
31th 2:33 UTC+2, another sync
31th 8:33 UTC+2, another sync
31th 9:36 UTC+2, complete sync of epel7, epel8, epel8-next following

31th 10:06 UTC+2 started successful publish&promote

I cannot really tell when the new uuid showed up in pulp. If it’s right then it was during a sync and only during the successful publish after the cleanup katello accessed it the first time.

If someone tells me how to check the pulpcore database for those duplicates I could check directly. Or maybe I can find something in the pulpcore worker logs?

If I know how to check pulpcore for those duplicates I could do the whole cleanup again and check regularly to see when it appears.

On the other hand, if that timestamp is correct, it happened on the sync. Maybe EPEL rearranges something on the 30th of each month in their metadata causing pulpcore to create a duplicate?

O.K. For a start I check the the foreman database for duplicates in the library:

foreman=# select c.id,a.name from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null group by c.id,a.name having count(*) > 1;
 id |       name        
----+-------------------
  7 | Development tools
(1 row)

I force canceled the paused publish task.

Now I run the cleanup:

[root@foreman ~]# foreman-rake console
Loading production environment (Rails 6.0.3.7)
irb(main):001:0> package_group_ids = [3772]
=> [3772]
irb(main):002:0> repository_ids = []
=> []
irb(main):003:1* ::Katello::PackageGroup.find(package_group_ids).each do |package_group|
irb(main):004:1*   repository_ids << package_group.repositories.pluck(:id)
irb(main):005:1*   package_group.destroy
irb(main):006:0> end
=> [#<Katello::PackageGroup id: 3772, name: "Development tools", pulp_id: "/pulp/api/v3/content/rpm/packagegroups/e69069bf-67...", description: "A basic development environment.", created_at: "2022-07-30 12:48:56", updated_at: "2022-07-30 12:48:56", migrated_pulp3_href: nil, missing_from_migration: false, ignore_missing_from_migration: false>]
irb(main):007:0> repository_ids = repository_ids.uniq.flatten.compact
=> [7, 3419, 30585, 42928, 104968, 105173, 105378, 105538, 105698]
irb(main):008:0> ::Katello::Repository.find(repository_ids).each { |r| r.index_content }
=> [#<Katello::Repository id: 7,...

The duplicate is gone in the database:

foreman=# select c.id,a.name from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null group by c.id,a.name having count(*) > 1;
 id | name 
----+------
(0 rows)

Next I start a complete sync of EPEL 7. Still no duplicate found in the database.

Next I run the publish & promote of the centos7-epel7 content view which had the paused task before.

The SQL select above still doesn’t show anything. So at this time it seems in the foreman/katello database there is no duplicate.

If someone could tell me how to do the same check in the pulpcore database, i.e. one repository having more than one package group of the same name, I could check there, too. I have trouble figuring out which repository is the “library” instance and how that’s linked to the package groups…

@iballou @dralley It’s really weird. This morning I have cleaned up. There were no duplicates. But it seems during the very next sync of epel8-next one package group got duplicated again:

foreman=# select c.id,a.name from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null group by c.id,a.name having count(*) > 1;
  id   | name 
-------+------
 74315 | KDE
(1 row)

foreman=# select c.id, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 74315 and a.name = 'KDE';
  id   |  id  | name |                                   pulp_id                                    |                                                                                   description                                                                                    |         created_at         |         updated_at         |
 migrated_pulp3_href | missing_from_migration | ignore_missing_from_migration 
-------+------+------+------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------+
---------------------+------------------------+-------------------------------
 74315 | 8806 | KDE  | /pulp/api/v3/content/rpm/packagegroups/bdea6506-0866-483f-b87b-fd8d8ca66d43/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, desktop, system icons and desktop widgets, and many powerful KDE applications. | 2022-09-14 12:35:08.483095 | 2022-09-14 12:35:08.483148 |
                     | f                      | f
 74315 | 3747 | KDE  | /pulp/api/v3/content/rpm/packagegroups/a9f014e5-2d09-48bf-864a-b574489dbd8d/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, desktop, system icons and desktop widgets, and many powerful KDE applications. | 2022-07-30 12:45:26.614927 | 2022-07-30 12:45:26.614931 |
                     | f                      | f
(2 rows)

2022-09-14 12:35:08.483148 (UTC time) was during the next epel7 sync. At the moment I can access both package groups through the pulp URIs. It seems there is actually a difference:

# diff kde?.json
36c36
<     "digest": "30ec35228a26155d12297a78f6765d7df01804ca73c50c77c2df0fc29fbf396f",
---
>     "digest": "c1b3622f5aa73a537d6f60b404a2e0544356b0b5384ec03c2c20560baab4e023",
550a551,556
>             "name": "sddm-x11",
>             "requires": null,
>             "type": 3
>         },
>         {
>             "basearchonly": null,
562,563c568,569
<     "pulp_created": "2022-07-30T12:42:27.721834Z",
<     "pulp_href": "/pulp/api/v3/content/rpm/packagegroups/a9f014e5-2d09-48bf-864a-b574489dbd8d/",
---
>     "pulp_created": "2022-09-14T12:34:49.781356Z",
>     "pulp_href": "/pulp/api/v3/content/rpm/packagegroups/bdea6506-0866-483f-b87b-fd8d8ca66d43/",

epel8-next has been updated this morning. Right now I can see updated repodata on one mirror while it still shows last modified 2022-08-10 on another mirror

I am wondering if this has to do with additive mirroring policy. The epel repositories are one of the fews with additive. Due to additive, the updated package group gets added. Later (days or weeks I think) something removes the old version of the package group in pulp, without katello noticing…

Interesting, based on your last post I think we’re getting somewhere. I wonder if perhaps EPEL is updating the package group in a way that Pulp isn’t expecting.
I know when errata get updated, Pulp will create a new erratum and the old one gets left behind. Perhaps it’s the same case here?

Orphan cleanup may be what’s cleaning up the old package group. Katello has pulp destroy old content view versions that aren’t indexed in our DB, and so perhaps the duplicated package groups are being deleted with that.

I’ll update the issue I made a while back with some testing steps. Seems we should test an additive repository that has changing package groups of the same name. I don’t think we should be keeping around the old package group is if it’s an RPM, we should be replacing it as if it were an erratum.

Today, I have got the 404 error again the first time since my last post. It seems duplicates have accumulated:

foreman=# select c.id,a.name from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null group by c.id,a.name having count(*) > 1;
  id   |      name      
-------+----------------
   177 | KDE
     7 | Xfce
   177 | Xfce
     7 | Electronic Lab
 74315 | KDE
(5 rows)

foreman=# select c.id, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 74315 and a.name = 'KDE';
  id   |  id  | name |                                   pulp_id                                    |                                                                                   description                                                                   
                 |         created_at         |         updated_at         | migrated_pulp3_href | missing_from_migration | ignore_missing_from_migration 
-------+------+------+------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------+----------------------------+----------------------------+---------------------+------------------------+-------------------------------
 74315 | 8806 | KDE  | /pulp/api/v3/content/rpm/packagegroups/bdea6506-0866-483f-b87b-fd8d8ca66d43/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, desktop, system icons and desktop widgets, and many powerful K
DE applications. | 2022-09-14 12:35:08.483095 | 2022-09-14 12:35:08.483148 |                     | f                      | f
 74315 | 3747 | KDE  | /pulp/api/v3/content/rpm/packagegroups/a9f014e5-2d09-48bf-864a-b574489dbd8d/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, desktop, system icons and desktop widgets, and many powerful K
DE applications. | 2022-07-30 12:45:26.614927 | 2022-07-30 12:45:26.614931 |                     | f                      | f
(2 rows)

foreman=# select c.id, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 7 and a.name = 'Xfce';
 id |  id  | name |                                   pulp_id                                    |                              description                               |         created_at         |         updated_at         | migrated_pulp3_href | missing_fr
om_migration | ignore_missing_from_migration 
----+------+------+------------------------------------------------------------------------------+------------------------------------------------------------------------+----------------------------+----------------------------+---------------------+-----------
-------------+-------------------------------
  7 | 3767 | Xfce | /pulp/api/v3/content/rpm/packagegroups/178d7527-4ef2-46ef-8ef4-8cefe4c9d17c/ | A lightweight desktop environment that works well on low end machines. | 2022-07-30 12:48:56.819516 | 2022-07-30 12:48:56.819522 |                     | f         
             | f
  7 | 9649 | Xfce | /pulp/api/v3/content/rpm/packagegroups/d6dfd6b7-36a0-484a-9149-b67574bd7c95/ | A lightweight desktop environment that works well on low end machines. | 2022-09-18 06:47:31.339289 | 2022-09-18 06:47:31.339308 |                     | f         
             | f
(2 rows)

foreman=# select c.id, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 7 and a.name = 'Electronic Lab';
 id |  id  |      name      |                                   pulp_id                                    |                     description                     |         created_at         |         updated_at         | migrated_pulp3_href | missing_from_migrat
ion | ignore_missing_from_migration 
----+------+----------------+------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------+----------------------------+---------------------+--------------------
----+-------------------------------
  7 | 9848 | Electronic Lab | /pulp/api/v3/content/rpm/packagegroups/b0ecdc3b-298a-46ee-872d-abeefdc015a3/ | Design and Simulation tools for hardware engineers. | 2022-09-21 12:47:29.458834 | 2022-09-21 12:47:29.458872 |                     | f                  
    | f
  7 | 1284 | Electronic Lab | /pulp/api/v3/content/rpm/packagegroups/b7414be6-4540-4369-af4b-bf553bb1eb5c/ | Design and Simulation tools for hardware engineers. | 2022-03-15 07:46:36.666898 | 2022-03-15 07:46:36.670276 |                     | f                  
    | f
(2 rows)

foreman=# select c.id, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 177 and a.name = 'Xfce'; 
 id  |  id  | name |                                   pulp_id                                    |                              description                               |         created_at         |         updated_at         | migrated_pulp3_href | missing_f
rom_migration | ignore_missing_from_migration 
-----+------+------+------------------------------------------------------------------------------+------------------------------------------------------------------------+----------------------------+----------------------------+---------------------+----------
--------------+-------------------------------
 177 | 3746 | Xfce | /pulp/api/v3/content/rpm/packagegroups/2be268c5-ff45-4ff8-96ac-17af587a9cd8/ | A lightweight desktop environment that works well on low end machines. | 2022-07-30 12:45:26.614919 | 2022-07-30 12:45:26.614923 |                     | f        
              | f
 177 | 9627 | Xfce | /pulp/api/v3/content/rpm/packagegroups/87779c4a-003f-4f63-b4e6-0e3b6cf3b7b4/ | A lightweight desktop environment that works well on low end machines. | 2022-09-18 06:45:00.621548 | 2022-09-18 06:45:00.621576 |                     | f        
              | f
(2 rows)

foreman=# select c.id, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 177 and a.name = 'KDE';
 id  |  id  | name |                                   pulp_id                                    |                                                                                   description                                                                     
               |         created_at         |         updated_at         | migrated_pulp3_href | missing_from_migration | ignore_missing_from_migration 
-----+------+------+------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+----------------------------+----------------------------+---------------------+------------------------+-------------------------------
 177 | 3747 | KDE  | /pulp/api/v3/content/rpm/packagegroups/a9f014e5-2d09-48bf-864a-b574489dbd8d/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, desktop, system icons and desktop widgets, and many powerful KDE
 applications. | 2022-07-30 12:45:26.614927 | 2022-07-30 12:45:26.614931 |                     | f                      | f
 177 | 9826 | KDE  | /pulp/api/v3/content/rpm/packagegroups/1c0fac5a-eddf-4bde-9dd5-5433852b6f9a/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, desktop, system icons and desktop widgets, and many powerful KDE
 applications. | 2022-09-21 06:44:43.927563 | 2022-09-21 06:44:43.927669 |                     | f                      | f
(2 rows)

Only one is missing in the pulpcore database at the moment. It’s the UUID which caused the 404:

foreman=# \c pulpcore
You are now connected to database "pulpcore" as user "postgres".
pulpcore=# select pulp_id,content_ptr_id from public.dblink('dbname=foreman', 'select substring(pulp_id from 40 for 36 ) from  katello_package_groups') as katello_package_groups(pulp_id uuid) full outer  join rpm_packagegroup on katello_package_groups.pulp_id = rpm_packagegroup.content_ptr_id where rpm_packagegroup.content_ptr_id is null;
               pulp_id                | content_ptr_id 
--------------------------------------+----------------
 178d7527-4ef2-46ef-8ef4-8cefe4c9d17c | 
(1 row)

That’s the Xfce package group with id 3767. The KDE duplicate from my previous post still exists in pulpcore.

I’ll clean out the 3767 with the rake console to get the content view published… Waiting for the next error, then…

@gvde just to be sure, these duplicates exist within the same repository version? If you are able and have the data still, can you see the Pulp sync task output in Dynflow showing that package groups were removed and added?

Something like

{
  "pulp_href": "/pulp/api/v3/repositories/rpm/rpm/c56f7193-7fd9-4115-9ece-c9a0a78d3571/versions/2/",
  "pulp_created": "2022-09-26T21:23:42.501882Z",
  "number": 2,
  "repository": "/pulp/api/v3/repositories/rpm/rpm/c56f7193-7fd9-4115-9ece-c9a0a78d3571/",
  "base_version": null,
  "content_summary": {
    "added": {
      "rpm.packagegroup": {
        "count": 1,
        "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/c56f7193-7fd9-4115-9ece-c9a0a78d3571/versions/2/"
      },
    },
    "removed": {
      "rpm.packagegroup": {
        "count": 1,
        "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version_removed=/pulp/api/v3/repositories/rpm/rpm/c56f7193-7fd9-4115-9ece-c9a0a78d3571/versions/2/"
      },
    },
    "present": {
      "rpm.packagegroup": {
        "count": 2,
        "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version=/pulp/api/v3/repositories/rpm/rpm/c56f7193-7fd9-4115-9ece-c9a0a78d3571/versions/2/"
      },
    }
  }
}

I wasn’t able to reproduce any issue with package groups being update on master and pulp-rpm 3.17.5, I’ll have to try on a real 4.4.1 box…

On my setup, I tried updating the package list for a package group and resynced. Pulp kicked out the old package group and brought in a new one. Katello indexed it accordingly. That was with additive too.

If duplicate package groups really do exist within the same repository version, that sounds like it might be Pulp bug… Hopefully I can indeed reproduce it.

@iballou I am not sure what you mean exactly. For the current duplicates:

foreman=# select c.id,a.name from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null group by c.id,a.name having count(*) > 1;
  id   |      name      
-------+----------------
   177 | KDE
   177 | Xfce
     7 | Electronic Lab
 74315 | KDE
(4 rows)

The first column denotes the id from katello_repositories and I think that means it’s a specific version. Each duplicate package group references the same katello_repository id.

foreman=# select c.id, c.content_view_version_id, c.version_href, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 177 and a.name = 'Xfce';
 id  | content_view_version_id |                                     version_href                                     |  id  | name |                                   pulp_id        
                            |                              description                               |         created_at         |         updated_at         | migrated_pulp3_href | 
missing_from_migration | ignore_missing_from_migration 
-----+-------------------------+--------------------------------------------------------------------------------------+------+------+--------------------------------------------------
----------------------------+------------------------------------------------------------------------+----------------------------+----------------------------+---------------------+-
-----------------------+-------------------------------
 177 |                       1 | /pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/390/ | 3746 | Xfce | /pulp/api/v3/content/rpm/packagegroups/2be268c5-f
f45-4ff8-96ac-17af587a9cd8/ | A lightweight desktop environment that works well on low end machines. | 2022-07-30 12:45:26.614919 | 2022-07-30 12:45:26.614923 |                     | 
f                      | f
 177 |                       1 | /pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/390/ | 9627 | Xfce | /pulp/api/v3/content/rpm/packagegroups/87779c4a-0
03f-4f63-b4e6-0e3b6cf3b7b4/ | A lightweight desktop environment that works well on low end machines. | 2022-09-18 06:45:00.621548 | 2022-09-18 06:45:00.621576 |                     | 
f                      | f
(2 rows)

If you give me the commands for the rake console, I could run those, too.

I see duplicates on my EL8 based future production server with 4.5.0 as well. All I do is sync EPEL 7 from Index of /epel/7/x86_64 with immediate-additive policy. After a couple days they pop up…

@iballou Where do I find this output? I have checked the dynflow console for one of the EPEL 7 syncs which added a package group according to the created_at field in the database, but I don’t see anything like that output you have posted.

Ah, I forgot it doesn’t show up directly in Dynflow. So on a sync task in Dynflow, in the JSON from the Pulp task, there should be a “created_resources” entry. If the sync resulted in a new repository version, it will show up there.

Then, you should be able to view it with pulp show --href <repo version href>
If for some reason the Pulp CLI isn’t working, you can use the api:

curl https://`hostname`/<repo version href>   --cert /etc/pki/katello/certs/pulp-client.crt  --key /etc/pki/katello/private/pulp-client.key

From the Katello DB output you gave me, I can tell that Katello thinks the package groups exist in the same repository version, but that could be a Katello indexing bug (additive issue?) if it’s false.

Querying the version that you sent above in your Katello DB paste should confirm the issue. Try this:

curl https://`hostname`/pulp/api/v3/content/rpm/packagegroups/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/390/ --cert /etc/pki/katello/certs/pulp-client.crt  --key /etc/pki/katello/private/pulp-client.key

That should show all the package groups that are truly in that Pulp repository version. If there is truly only one KDE, Xfce, etc, then the Katello DB is out of sync.

O.K. Let’s see if I got this right. I check the duplicate for KDE in EPEL 8, as it’s one of the package groups which have been created the latest on September 21st.

foreman=# select c.id, c.content_view_version_id, c.version_href, a.* from katello_package_groups a left join katello_repository_package_groups b on a.id = b.package_group_id left join katello_repositories c on b.repository_id = c.id where c.library_instance_id is null and c.id = 177 and a.name = 'KDE';
 id  | content_view_version_id |                                     version_href                                     |  id  | name |                       
            pulp_id                                    |                                                                                   description      
                                                                              |         created_at         |         updated_at         | migrated_pulp3_hre
f | missing_from_migration | ignore_missing_from_migration 
-----+-------------------------+--------------------------------------------------------------------------------------+------+------+-----------------------
-------------------------------------------------------+----------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------+----------------------------+----------------------------+-------------------
--+------------------------+-------------------------------
 177 |                       1 | /pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/391/ | 3747 | KDE  | /pulp/api/v3/content/r
pm/packagegroups/a9f014e5-2d09-48bf-864a-b574489dbd8d/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, d
esktop, system icons and desktop widgets, and many powerful KDE applications. | 2022-07-30 12:45:26.614927 | 2022-07-30 12:45:26.614931 |                   
  | f                      | f
 177 |                       1 | /pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/391/ | 9826 | KDE  | /pulp/api/v3/content/r
pm/packagegroups/1c0fac5a-eddf-4bde-9dd5-5433852b6f9a/ | The KDE Plasma Workspaces, a highly-configurable graphical user interface which includes a panel, d
esktop, system icons and desktop widgets, and many powerful KDE applications. | 2022-09-21 06:44:43.927563 | 2022-09-21 06:44:43.927669 |                   
  | f                      | f
(2 rows)

I go into the Tasks and searched for the EPEL 8 sync task running at 2022-09-21 06:44:43 UTC. The pulp action Actions::Pulp3::Repository::Sync part of that task ran two minutes before. Output was:

---
pulp_tasks:
- pulp_href: "/pulp/api/v3/tasks/5f030e6c-e992-44ea-8ad6-717b014bffa6/"
  pulp_created: '2022-09-21T06:34:22.423+00:00'
  state: completed
  name: pulp_rpm.app.tasks.synchronizing.synchronize
  logging_cid: 5f057e9b-8f9e-4cb1-a94d-f9f924a4e8a9
  started_at: '2022-09-21T06:34:22.672+00:00'
  finished_at: '2022-09-21T06:42:14.829+00:00'
  worker: "/pulp/api/v3/workers/9d239acd-442f-451f-a348-a75b98c45ef6/"
  child_tasks: []
  progress_reports:
  - message: Downloading Metadata Files
    code: sync.downloading.metadata
    state: completed
    done: 6
  - message: Downloading Artifacts
    code: sync.downloading.artifacts
    state: completed
    done: 14
  - message: Associating Content
    code: associating.content
    state: completed
    done: 25
  - message: Skipping Packages
    code: sync.skipped.packages
    state: completed
    total: 0
    done: 0
  - message: Parsed Packages
    code: sync.parsing.packages
    state: completed
    total: 9241
    done: 9241
  - message: Parsed Comps
    code: sync.parsing.comps
    state: completed
    total: 24
    done: 24
  - message: Parsed Advisories
    code: sync.parsing.advisories
    state: completed
    total: 3562
    done: 3562
  created_resources:
  - "/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
  reserved_resources_record:
  - "/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/"
  - shared:/pulp/api/v3/remotes/rpm/rpm/6f1837c9-d88f-42c2-861d-ff7e931885b9/
create_version: true
task_groups: []
poll_attempts:385
  total: 50
  failed: 0

So it’s my understanding that the sync at that time created version 385. Thus I check the href /pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/. curl gets me

# curl --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key https://foreman.example.com/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/ | python -m json.tool
{
    "base_version": null,
    "content_summary": {
        "added": {
            "rpm.advisory": {
                "count": 10,
                "href": "/pulp/api/v3/content/rpm/advisories/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.package": {
                "count": 14,
                "href": "/pulp/api/v3/content/rpm/packages/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.packagegroup": {
                "count": 1,
                "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            }
        },
        "present": {
            "rpm.advisory": {
                "count": 4768,
                "href": "/pulp/api/v3/content/rpm/advisories/?repository_version=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.package": {
                "count": 15882,
                "href": "/pulp/api/v3/content/rpm/packages/?repository_version=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.packagecategory": {
                "count": 1,
                "href": "/pulp/api/v3/content/rpm/packagecategories/?repository_version=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.packageenvironment": {
                "count": 1,
                "href": "/pulp/api/v3/content/rpm/packageenvironments/?repository_version=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.packagegroup": {
                "count": 22,
                "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            }
        },
        "removed": {
            "rpm.advisory": {
                "count": 1,
                "href": "/pulp/api/v3/content/rpm/advisories/?repository_version_removed=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            },
            "rpm.packagegroup": {
                "count": 1,
                "href": "/pulp/api/v3/content/rpm/packagegroups/?repository_version_removed=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/"
            }
        }
    },
    "number": 385,
    "pulp_created": "2022-09-21T06:34:24.970345Z",
    "pulp_href": "/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/",
    "repository": "/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/"
}

So it’s one package group added and one removed. I use the hrefs for added and removed to find out which one. First the added one:

# curl --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key 'https://foreman.example.com/pulp/api/v3/content/rpm/packagegroups/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/' | python -m json.tool
{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "biarch_only": false,
            "default": false,
            "desc_by_lang": {
...
            "pulp_created": "2022-09-21T06:42:10.499813Z",
            "pulp_href": "/pulp/api/v3/content/rpm/packagegroups/1c0fac5a-eddf-4bde-9dd5-5433852b6f9a/",
            "user_visible": false
        }
    ]
}

The href matches the href from the new package group in the database. Next I check for the group removed:

# curl --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key 'https://foreman.example.com/pulp/api/v3/content/rpm/packagegroups/?repository_version_removed=/pulp/api/v3/repositories/rpm/rpm/d9268fee-1ddb-4dff-bc75-8dec273ef4ef/versions/385/' | python -m json.tool
{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "biarch_only": false,
            "default": false,
            "desc_by_lang": {
...
            "pulp_created": "2022-07-30T12:42:27.721834Z",
            "pulp_href": "/pulp/api/v3/content/rpm/packagegroups/a9f014e5-2d09-48bf-864a-b574489dbd8d/",
            "user_visible": false
        }
    ]
}

Now that href matches the other href of the older database entry.

So this looks to me as if pulp removed the package group with uuid a9f014e5-2d09-48bf-864a-b574489dbd8d in that version but katello did not remove it?

I have also used the ?repository_version= href with different versions. Version 383 doesn’t exist anymore. 384 through the latest 391 exist. a9f014e5-2d09-48bf-864a-b574489dbd8d is only in version 384. 385 through the latest 391 only contain the new uuid 1c0fac5a-eddf-4bde-9dd5-5433852b6f9a. This matches the task result above.

So I guess once the version proceeds and pulp removes version 384 from its database it will also remove the package group href and then it’s gone causing the 404?

1 Like

Hey, I just reproduced this on Katello 4.5 and your report here matches my findings too.

It looks like Pulp is correctly removing the old package group but Katello is not. I couldn’t reproduce this in master and I think we fixed something similar recently, but I need to check to be sure.

Pulp must be clearing out the “orphaned” package group during some async task which is why the 404 isn’t immediate. I think it should happen at orphan cleanup time, I’m not sure when else.

Anyway, now that I’ve reproduced it and we confirmed it’s in Katello, we should identify a fix soon.

Thanks for the detailed report!

1 Like

@sajha brought this issue up before, but I’ve confirmed that it fixes the problem: Bug #35120: Retain packages on Repository removes RPMs from Pulp but not from Katello - Katello - Foreman

We should be able to cherry pick it and get a 4.5.1 out. Maybe a 4.4.2 as well.

2 Likes

Does that pull request also clear out the existing duplicates in the database?

It won’t, but a resync should remove the duplicates (or a re-index to be more specific).

A “resync” being a “Complete sync” on the repository?

A normal sync wouldn’t work because IndexContent will likely be optimized out. However, if you want to avoid unnecessary work, you could try:

repo_ids = [<IDs of repos that need reindexing>]
repo_ids.each { |repo_id| ::Katello::Repository.find(repo_id).index_content(full_index: true) }

The other valid option is “complete sync” like you mentioned.

It seems to be as if the commit didn’t make it into 4.6.0. I have the same issue again and the patch is missing in rubygem-katello-4.6.0-1.el8.noarch.

Looking on github it seems, it’s has been added to the KATELLO-4.5 and 4.7 branch but not to the 4.6 branch…