Possible to use Leapp for Oracle 7 -> Oracle 8 migration?

Ah! Sorry, missed that one. I used those commands and removed the facts, and got a clean, successful reindex! Thank you so much again for your help, it’s very appreciated!

1 Like

@JendaVodka I’m going to try to write up a script to delete the oldest dupes that you have. In the meantime, I recommend taking a backup if you don’t already have one.

Also, if reverting to a point before your upgrade is at all possible I would recommend that.

I would not sync any repositories until you’re able to reindex the DB. The corruption may end up affecting other tables or Postgres DBs.

@iballou you are very kind ! I have some backups and will take new but I think it would not be possible to revert it back before upgrade. I hope that we will solve this issue some how.

1 Like

Here is my idea:

# Find all erratum packages that have module stream erratum packages. Assumes that module stream errata packages are not duplicated
erratum_packages_to_keep = ::Katello::ModuleStreamErratumPackage.all.pluck(:erratum_package_id)

# Mark all erratum package duplicates that do not have module stream erratum packages as deletable
to_delete = ::Katello::ErratumPackage.where.not(id: ::Katello::ErratumPackage.group(:erratum_id, :nvrea).select("max(id)").where.not(id: erratum_packages_to_keep)

# Perform the deletion
to_delete.delete_all

# Check if there are remaining duplicates
# The query above takes the oldest packages as duplicates -- it's possible that the newest package was not related to a module stream erratum package that it should've had
 ::Katello::ErratumPackage.where.not(id: ::Katello::ErratumPackage.group(:erratum_id, :nvrea).select("max(id)").count

# If there are duplicates still, re-run but use the min(id)

to_delete = ::Katello::ErratumPackage.where.not(id: ::Katello::ErratumPackage.group(:erratum_id, :nvrea).select("min(id)").where.not(id: erratum_packages_to_keep)

to_delete.delete_all

I would like an ACK from someone first, maybe @sajha ?
This might not be optimized, but I believe it is not dangerous and at least should not be slow.

1 Like

I chatted with @sajha elsewhere about this and we decided that it should be fine to run, but my assumption that the module stream erratum packages are not duplicated could be false. If that’s the case, we’ll end up with duplicates in the end that are still not deleted.

These duplicates can be found again with ::Katello::ErratumPackage.where.not(id: ::Katello::ErratumPackage.group(:erratum_id, :nvrea).select("max(id)").count

Okay, so while the above cleanup should be helpful, a surefire method of cleanup would be to just wipe out all duplicates and essentially resync the bad repositories like I mentioned before. I’m not sure how many are affected, but it’s likely that only a select few are depending on your sync schedule.

I have an updated command to find the “bad” repositories:

::Katello::ErratumPackage.where.not(id: ::Katello::ErratumPackage.group(:erratum_id, :nvrea).select("max(id)")).pluck(:erratum_id)

bad_repos = ::Katello::Repository.joins(:errata).where("katello_errata.id in (#{errata_ids.join(',')})")

bad_repos.count

This query hopefully shouldn’t run too long like the first one I asked you to run.

Hello, I ran above, but it does not produce any output:

Loading production environment (Rails 6.1.6.1)
irb(main):001:1* errata_ids = ::Katello::ErratumPackage.where.not(id: ::Katello::ErratumPackage.group(:erratum_id, :nvrea).select(“max(id)”).pluck(:erratum_id)
irb(main):002:1> bad_repos = ::Katello::Repository.joins(:errata).where(“katello_errata.id in (#{errata_ids.join(’,’)})”)
irb(main):003:1> bad_repos.count
irb(main):004:1*

Apologies, there was a missing parenthesis. I updated the command.

Well, the first command ran for more than 2 hours and did not finish :cry:

If the prompt has the * at the end like that, it’s waiting for more input. Just ‘ctrl+C’ to get out of it :slight_smile:

1 Like

Hello @iballou,
I finally managed to remove duplicates from the katello_erratum_packages table with check against the katello_module_stream_erratum_packages table. Then reindexdb went smoothly. I don’t know if there must be done something else. I also fear that those duplicates will appear again.

2 Likes

@JendaVodka you don’t have to worry about the duplicates coming back since you reindexed. The bug happened due to the Postgres-related issue with the indexes.

When you have a minute, would you mind telling us briefly how you removed the duplicates?

I tried to understand your foreman-rake script and wrote a shell script based on it because I am not an SQL expert.
This script selected all the records from katello_erratum_packages table and parsed it via sort and uniq, For lines which were more than once in the output, it then ran check for each erratum_id and nvrea if it did not exist in katello_module_stream_erratum_packages table , and if not, it deleted the duplicated row based on the highest id. Thus it deleted about 300 rows.

1 Like

Thank you for the explanation, it’s really good data for future cleanup!

im stuck trying to do this same upgrade. currently on foreman 3.3 on Oracle Linux Server 7.9, i started whith foreman 1.19 and centos 7 and ended here.

i apply the fixes mentioned by @brgerig and leapp stoped with this error

2023-02-21 15:16:17.082284 [ERROR] Actor: dnf_transaction_check
Message: DNF execution failed with non zero exit code.
STDOUT:
Last metadata expiration check: 0:02:57 ago on Tue Feb 21 15:12:42 2023.
Package rubygem-foreman_maintain-1:1.1.10-1.el7.noarch is already installed.
Package foreman-installer-1:3.3.1-2.el7.noarch is already installed.

STDERR:
Warning: Package marked by Leapp to install not found in repositories metadata: libnsl2-devel rpcgen rubygem-hammer_cli rubygem-hammer_cli_foreman rubygem-foreman_puppet rpcsvc-proto-devel rubygem-hammer_cli_foreman_puppet
Warning: Package marked by Leapp to upgrade not found in repositories metadata: python2-leapp leapp-upgrade-el7toel8 leapp
Transaction check:

 Problem 1: package rh-ruby27-ruby-libs-2.7.6-131.el7.x86_64 requires libgdbm.so.4()(64bit), but none of the providers can be installed
  - package foreman-installer-1:3.3.1-2.el7.noarch requires rh-ruby27-ruby(release), but none of the providers can be installed
  - gdbm-1.10-8.el7.x86_64 does not belong to a distupgrade repository
  - conflicting requests
 Problem 2: package rh-postgresql12-postgresql-syspaths-12.11-1.el7.x86_64 conflicts with postgresql provided by postgresql-12.12-1.module+el8.6.0+20851+edfb83f8.x86_64
  - package rubygem-foreman_maintain-1:1.1.10-1.el7.noarch requires rh-postgresql12-postgresql-syspaths, but none of the providers can be installed
  - cannot install the best candidate for the job
  - conflicting requests

my foreman installation is fully functional after all th upgrades, just provisioned a ubuntu 18.04 server to test it, im only updating because i need ubuntu 22.04 support so i can rebuild all our 18.04 machines

When upgrading our satellite OS from rhel 7 to 8 we stumbled across this aswell. We executed the SQL query to clean up the puppetfactname table, but we still had the same issue.

After running the same commands again (but going with the interactive methode), i noticed that the number of facts was higher then 0 (we did had satellite/foreman running).

I then proposed to use: satellite-maintain service stop --exclude postgresql (replace satellite for foreman in your case) and re-ran the SQL query and we did notice the amount of facts in the table to be 0.
Reindexing the table then went withoud any problems and we are now enjoying RHEL 8 :slight_smile:

In JendaVodka’s case, i believe a sync was happening (pulp) and therefore that problem occured while reindexing the database.

We also ran into this, as we upgraded to EL8 on 3.2 (where the docs were also missing the DB reindex step) and only noticed since I stumbled upon this thread out of curiosity initially.
In our dev environment, we managed to get the Puppet facts indexed via the commands provided by @evgeni and the also stumbled upon the duplicate entries problem @JendaVodka encountered. We continued with the procedure provided by @iballou (adding some missing brackets), but the delete_all commands always returned with an exception after running for about 1.5 hours:

Traceback (most recent call last):
        2: from lib/tasks/console.rake:5:in `block in <top (required)>'
        1: from (irb):6
ActiveRecord::InvalidForeignKey (PG::ForeignKeyViolation: ERROR:  update or delete on table "katello_erratum_packages" violates foreign key constraint "katello_msep_erratum_package_id_fk" on table "katello_module_stream_erratum_packages")
DETAIL:  Key (id)=(975315) is still referenced from table "katello_module_stream_erratum_packages". 

From what I understand, this error should have not happened initially since IDs from katello_module_stream_erratum_packages should have been excluded from the start via the erratum_packages_to_keep part, but still this happened.
We then decided to go the DB way and fix it via SQL. From the foreman-rake console commands provided, I tried to extrapolate that to DB queries that should find all the duplicates that are not referenced by katello_module_stream_erratum_packages:
select * from katello_erratum_packages where erratum_id in (select erratum_id from katello_erratum_packages group by nvrea, erratum_id having count(nvrea) >1) and nvrea in (select nvrea from katello_erratum_packages group by nvrea, erratum_id having count(nvrea) >1) order by nvrea;

Assuming the higher IDs from that output are probably the erroneous duplicates (since they were beeing generated at a later time), we then continued to delete all of the via:
delete from katello_erratum_packages where id in (select max(id) from katello_erratum_packages where erratum_id in (select erratum_id from katello_erratum_packages group by nvrea, erratum_id having count(nvrea) >1) and nvrea in (select nvrea from katello_erratum_packages group by nvrea, erratum_id having count(nvrea) >1) and id not in (select id from katello_module_stream_erratum_packages) group by nvrea, erratum_id order by nvrea);
This ran through smoothly and pretty fast, and after that the reindex also went by without further errors.

While we achieved our goal (reindex finishing without errors), and I felt somewhat comfortable tampering with the DB since this is a low-frequented dev environment and we are having working backups, I would still appreciate some feedback if this is the “correct” way to handle this problem before we deploy this to production. Having to roll back to a backup there might have bigger implications, especially if we realize late that something is going wrong.

1 Like

This method should work just fine. I had some concerns before about deleting the newest vs the oldest duplicates, but it shouldn’t matter since the ErratumPackages have limited information in them. The duplicates are duplicated on their nvrea, which is the most important bit of information on the model.

The fact that no module stream erratum packages were involved makes it even more safe.

So I suppose your production server is already in the errored state with duplicates? If not, and you upgrade with the reindex, you shouldn’t need to do this cleanup on your production machine anyway.

1 Like

Thanks for the confirmation.
Yes, our production machine also already is in that state. A quick SQL query shows me 327 duplicate entries. It will probably take some days until we get a downtime for it, but at least we now know how to fix that. :slight_smile:

2 Likes

One tip – try to limit repository syncing and filtered content view publishing until you can run the reindexes. Until the database is reindexed, there is a chance that any model can become duplicated that shouldn’t be. In one case we even saw Pulpcore records get duplicated which could be really difficult to clean.

1 Like