Katello 4.0 content proxy upgrade disaster

Today, I have tried to upgrade my content proxy to katello 4.0. It kind of ended up in a complete disaster and for now I reverted back.

During the yum update I have noticed an error for foreman-installer:

/etc/foreman-installer/scenarios.d/katello.migrations/200605154320-dont-use-pulpcore-rpm-on-upgrades.rb:9:in `block (2 levels) in load_migrations': undefined method `[]' for true:TrueClass (NoMethodError)
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migrations.rb:25:in `instance_eval'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migrations.rb:25:in `block (2 levels) in load_migrations'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migration_context.rb:10:in `instance_eval'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migration_context.rb:10:in `execute'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migrations.rb:38:in `block in run'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migrations.rb:35:in `each'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/migrations.rb:35:in `run'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/configuration.rb:324:in `run_migrations'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/kafo_configure.rb:115:in `initialize'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/clamp-1.1.2/lib/clamp/command.rb:132:in `new'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/clamp-1.1.2/lib/clamp/command.rb:132:in `run'
	from /opt/theforeman/tfm/root/usr/share/gems/gems/kafo-6.2.1/lib/kafo/kafo_configure.rb:50:in `run'
	from /sbin/foreman-installer:8:in `<main>'
warning: %post(foreman-installer-katello-1:2.4.0-1.el7.noarch) scriptlet failed, exit status 1
Non-fatal POSTIN scriptlet failure in rpm package 1:foreman-installer-katello-2.4.0-1.el7.noarch

After the package update, I stopped all service and reran the installer as I did during my previous upgrades:

# foreman-maintain service stop
# foreman-installer 

I also cleaned up following the commands of this:

# foreman-maintain content remove-pulp2

i.e. manually running/adjusting the commands listed as it would otherwise remove python-ldap which I need for our ipa-client.

The installer didn’t complain, but afterwards none of my clients was able to access the repositories anymore.

For any repository (e.g during yum repolist) I just got a message like this:

https://foreman-proxy.example.com/pulp/repos/ORG/Production/centos7-epel7/custom/centos7/extras_x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found
Trying other mirror.

I guess it’s the same issue I have seen before after the pulp3 switchover on my main server, but as I have removed the old pulp2 directories everything went missing instead of being outdated.

I thought running an optimized sync on the proxy might fix this. However, the sync showed new issues: the sync task got stuck at ~70% and it already showed a couple of errors:

Task canceledTask canceledTask canceledTask canceledTask canceledduplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(AppStream, AppStream, AppStream, variant, Packages, 5e279f89-9fd6-40b6-8985-69cabafa8db8) already exists.
Task canceledduplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(AppStream, AppStream, AppStream, variant, Packages, 5e279f89-9fd6-40b6-8985-69cabafa8db8) already exists.
Task canceledTask canceledTask canceledduplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(AppStream, AppStream, AppStream, variant, Packages, 5e279f89-9fd6-40b6-8985-69cabafa8db8) already exists.
duplicate key value violates unique constraint "rpm_image_name_path_platforms_dist_188721ef_uniq"
DETAIL:  Key (name, path, platforms, distribution_tree_id)=(boot.iso, images/boot.iso, x86_64, b038687d-acf9-4894-a753-664413a8b4b1) already exists.
Task canceledTask canceledduplicate key value violates unique constraint "rpm_image_name_path_platforms_dist_188721ef_uniq"
DETAIL:  Key (name, path, platforms, distribution_tree_id)=(boot.iso, images/boot.iso, x86_64, b038687d-acf9-4894-a753-664413a8b4b1) already exists.

Eventually, I canceled the task and reverted the proxy to the snapshot before the upgrade. Everything is fine at the moment but of course, this is kind of a dead end.

O.K. Even worse, even before the upgrade the sync to the old 2.3/3.18 content proxy showed an error:

PLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed responsePLP0000: Importer indicated a failed response

According to the smart proxy page, the last sync is dated before the upgrade of the main server to Katello 4.0.

If I do a complete sync from the katello 4.0 main server to the 2.3.3/3.18.2 content proxy also shows errors:

RPM1008: Checksum type "sha256" is not available for all units in the repository. Make sure those units have been downloaded.
RPM1008: Checksum type "sha256" is not available for all units in the repository. Make sure those units have been downloaded.
RPM1008: Checksum type "sha256" is not available for all units in the repository. Make sure those units have been downloaded.
RPM1008: Checksum type "sha256" is not available for all units in the repository. Make sure those units have been downloaded.
RPM1008: Checksum type "sha256" is not available for all units in the repository. Make sure those units have been downloaded.
RPM1008: Checksum type "sha256" is not available for all units in the repository. Make sure those units have been downloaded.
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response
PLP0000: Importer indicated a failed response

@gvde Starting to look at this. I wonder if you are hitting this guy:

Bug: Smart Proxies do not sync (Katello 3.15 through 3.18 RC2) - Support - TheForeman

Today, I have tried it again: updated my main katello server and a content proxy from 3.18.3/2.3.5 to 4.0/2.4.1 and then 4.1/2.5.1. Both on latest CentOS 7. Still pretty much disastrous and it’s not really working.

Things I have noticed:

  1. After a promotion of a cv in a lifecycle environment which is on the content proxy there is no automatic syncronization of the smart proxy. I have to go to the smart proxy which tells me “2 environment(s) can be synchronized: Production, Testing” and click on Synchronize there…
  2. I ran a optimized sync which I have canceled after 30 minutes at 90% because there was no progress for the most of the time and the load on the proxy was close to 0.
  3. However, during the sync the proxy was busy at 100% for maybe 5-10 minutes (8 CPUs…). Before the sync was something like 2 minutes and didn’t completely load the vm…
  4. I also tried a “Complete Sync” for the proxy which leads again to those duplicate key errors:
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(plus, plus, plus, variant, Packages, 8166c814-fbad-436e-88b6-bc0ca3249298) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(plus, plus, plus, variant, Packages, 8166c814-fbad-436e-88b6-bc0ca3249298) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(plus, plus, plus, variant, Packages, 8166c814-fbad-436e-88b6-bc0ca3249298) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(extras, extras, extras, variant, Packages, 63db75c4-a258-4fbe-848e-32b485c0cde7) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(HighAvailability, HighAvailability, High Availability, variant, Packages, 37aa4e10-0130-41e0-939b-8874427969d0) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(HighAvailability, HighAvailability, High Availability, variant, Packages, 37aa4e10-0130-41e0-939b-8874427969d0) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(HighAvailability, HighAvailability, High Availability, variant, Packages, 37aa4e10-0130-41e0-939b-8874427969d0) already exists.
NoneNoneNoneNoneNoneNoneNoneNoneduplicate key value violates unique constraint "rpm_checksum_path_checksum_distribution_tree_id_fd3fe409_uniq"
DETAIL:  Key (path, checksum, distribution_tree_id)=(images/boot.iso, sha256:c79921e24d472144d8f36a0d5f409b12bd016d9d7d022fd703563973ca9c375c, 3b51baae-5aea-4028-9abd-9ed5608c9e75) already exists.
duplicate key value violates unique constraint "rpm_checksum_path_checksum_distribution_tree_id_fd3fe409_uniq"
DETAIL:  Key (path, checksum, distribution_tree_id)=(images/boot.iso, sha256:c79921e24d472144d8f36a0d5f409b12bd016d9d7d022fd703563973ca9c375c, 3b51baae-5aea-4028-9abd-9ed5608c9e75) already exists.
duplicate key value violates unique constraint "rpm_checksum_path_checksum_distribution_tree_id_fd3fe409_uniq"
DETAIL:  Key (path, checksum, distribution_tree_id)=(images/boot.iso, sha256:c79921e24d472144d8f36a0d5f409b12bd016d9d7d022fd703563973ca9c375c, 3b51baae-5aea-4028-9abd-9ed5608c9e75) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(PowerTools, PowerTools, PowerTools, variant, Packages, 08e43367-f103-41ff-9d36-f26ae8ca8e4e) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(PowerTools, PowerTools, PowerTools, variant, Packages, 08e43367-f103-41ff-9d36-f26ae8ca8e4e) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(PowerTools, PowerTools, PowerTools, variant, Packages, 08e43367-f103-41ff-9d36-f26ae8ca8e4e) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(AppStream, AppStream, AppStream, variant, Packages, 43c46c2b-f232-4aeb-b6df-1aab889d703c) already exists.
duplicate key value violates unique constraint "rpm_variant_variant_id_uid_name_type_764da894_uniq"
DETAIL:  Key (variant_id, uid, name, type, packages, distribution_tree_id)=(AppStream, AppStream, AppStream, variant, Packages, 43c46c2b-f232-4aeb-b6df-1aab889d703c) already exists.
  1. All clients connected to the content proxy wouldn’t be able to access the repositories. It would only get 404 errors like these:
Status code: 404 for https://foreman-proxy.example.com/pulp/repos/ORG/Production/centos8-epel8/custom/centos8/AppStream_x86_64/repodata/repomd.xml
  1. At some point the URL changed to https://foreman-proxy.example.com/pulp/content/ORG/Production/centos8-epel8/custom/centos8/AppStream_x86_64/repodata/repomd.xml but still getting a 404.
  2. I think I have published a new version of all my CVs and did the “Complete Sync” (which ended with those variant errors) at least the 404 were gone and the packages were listed in the repositories to the client.
  3. However, as I cannot get any sync finished it seems I am not able to make a new repository available to the clients (I wanted to add the 2.5 foreman client repository to the content view).

So basically, the system is not usable and I don’t see how I could process so I will revert the vms back. So far, at multiple upgrade attempts, I am not able to get a usable system at the end. Some issues like the broken applicability for the el8 modules or the lengthy publish times of cvs with filters I could live with for the time being…

  1. I have noticed that pulp2, mongodb, celery, etc. are still running in the content proxy with 4.1. Should that be gone?

And the “Complete Sync” to the content proxy also fills up the memory. The VM has currently 24 GB memory but during the sync a process got killed:

kernel: gunicorn invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

Possibly, that is the reason for the 404 errors…

I have reverted the VMs yesterday again (still have a snapshot of the latest attempt, though) as I wasn’t too sure that I would get the basic content management reliably working.

However, in the aftermath, I had two thoughts:

  1. it seems to me as if the upgraded content proxy only delivers content through the /pulp/content/ URI and not through /pulp/repos/ I guess, the latter is for backward compatibility to the pulp2 paths.

  2. the 404 on the /pulp/content/ is probably because of the OOM. Can’t tell for sure but I wasn’t able to get a complete sync finished without error and OOM.

The original proxy had 16 GB memory. The upgraded had OOM kills even with 24 GB.

This was my third upgrade attempt now. It’s kind of sad. I have a pretty much perfectly running 3.18 system with pulp2 and the upgrade to 4 and pulp3 puts me far back. Lots of issues, slow performance, problems sorted out with pulp2 (like the applicability) now pop up again…

I don’t really want to start all over from scratch. Even the repository sets for almost 300 servers would take long to fully recreate. I guess, my next attempt will be to switch all content hosts over to the main server, only upgrade the main server and start over with a new content proxy. Hopefully, that will take care of the memory issues on the content proxy and it doesn’t require too much work…