Katello 4.7.0 rc2 Upgrade

Problem: While upgrading to Foreman/Katello 3.5/4.7 I encountered this error:

2022-12-02 11:01:43 [NOTICE] [root] Loading installer configuration. This will take some time.
2022-12-02 11:01:49 [NOTICE] [root] Running installer with log based terminal output at level NOTICE.
2022-12-02 11:01:49 [NOTICE] [root] Use -l to set the terminal output log level to ERROR, WARN, NOTICE, INFO, or DEBUG. See --full-help for definitions.
2022-12-02 11:02:02 [NOTICE] [configure] Starting system configuration.
2022-12-02 11:02:20 [NOTICE] [configure] 250 configuration steps out of 1732 steps complete.
2022-12-02 11:02:25 [NOTICE] [configure] 500 configuration steps out of 1732 steps complete.
2022-12-02 11:02:29 [NOTICE] [configure] 750 configuration steps out of 1737 steps complete.
2022-12-02 11:02:30 [NOTICE] [configure] 1000 configuration steps out of 1744 steps complete.
2022-12-02 11:02:31 [NOTICE] [configure] 1250 configuration steps out of 1744 steps complete.
2022-12-02 11:03:09 [NOTICE] [configure] 1500 configuration steps out of 1744 steps complete.
2022-12-02 11:07:12 [ERROR ] [configure] 'pulpcore-manager migrate --noinput' returned 1 instead of one of [0]
2022-12-02 11:07:12 [ERROR ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: change from 'notrun' to ['0'] failed: 'pulpcore-manager migrate --noinput' returned 1 instead of one of [0]
2022-12-02 11:07:18 [NOTICE] [configure] System configuration has finished.```

**Expected outcome:** Migration completes without issues

**Foreman and Proxy versions:** 3.5.0rc2/4.7.0rc2

**Foreman and Proxy plugin versions:** 

**Distribution and version:** CentOS 8 stream

**Other relevant data:**
I run a self-subscribed instance, and it had an earlier failure while resetting the redis module (so I re-enabled the generic CentOS Stream baseos/appstream repos). The second time I ran foreman-installer --scenario katello I encountered the above error.

I got the command to complete by running `pulpcore-manager migrate` in /etc/pulp as root. It's not clear why the code didn't pick up the settings file - it seems like it should have. Once I ran the command manually (I got a lock/access error; so I ran `foreman-maintain service stop and then explicitly restarted postgresql and redis, and ran the migration "manually", once this was done `foreman-installer --scenario=katello` ran successfully

Hi @mhjacks,

Thanks for reporting the issue.
What was the version that you upgraded?
Please provide the actual error with the pulpcore migration for debugging.

I was upgrading from 3.4.1.

I can’t find the original error in the logs - but when I ran the command manually, my recollection is that it couldn’t expand ANSIBLE_CONTENT_HOSTNAME properly, which pointed to a problem with reading /etc/pulp/settings.py. I worked out how to run this command outside of kafo to clear the error.

I had an additional problem with upgrading redis, due to self subscription (the repos weren’t available to install redis 6 while foreman/katello was down), so I re-enabled the public ones to allow kafo to complete.

Could you run migration again and collect the logs:

sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' pulpcore-manager migrate

There should have been a more detailed log in /var/log/foreman-installer/katello.log, but what @lfu suggested is what the installer actually runs.

Right - I worked out how to invoke that manually, and once I cleared that stage, foreman-installer --scenario=katello completes cleanly. (The mechanism I used to get past the error was to call the pulpcore-manager command (as root) inside the /etc/pulp directory.

One thing I noticed in the logs (while I was surprised to not see the record of the original pulp migration failure) was this:

katello.20221202-105741.log:2022-12-02 10:51:37 [INFO  ] [configure] +ANSIBLE_API_HOSTNAME = "https://srv-katello.imladris.lan"
katello.20221202-105741.log:2022-12-02 10:51:37 [INFO  ] [configure] ANSIBLE_CONTENT_HOSTNAME = "https://srv-katello.imladris.lan/pulp/content"```

The other errors I’ve hit are deadlock issues - which might have to do with edge cases in how and when I’m running the configurator (as reflected by the redis upgrade problem).

Which certainly seems to be relevant, if on one side or the other of the migration it’s expecting/not expecting the schema to be attached to the hostname.

To be clear - my process for upgrade is normally:

  1. Add the new versions of foreman, katello, foreman_plugins, candlepin, and pulpcore to my foreman server content view
  2. DNF upgrade the new packages
  3. run foreman-installser --scenario=katello to complete configuration steps

This process errored out the first time I ran it because no repos were available to execute the redis upgrade, and I wound up manually figuring stuff out from there. The pulpcore issue I spotted surprised me a bit, but I’m starting to think that the reason I hit that was because foreman-installer bombed out in a strange spot, and the REAL issue that hit me was the deadlock (which was because foreman/etc services were running).

Specifically, this is the error recorded in the katello log after I did the upgrade:

2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:   Apply all migrations: ansible, auth, certguard, container, contenttypes, core, deb, file, ostree, python, rpm, sessions
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: Running migrations:
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:   Applying core.0090_char_to_text_field...Traceback (most recent call last):
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:   File "/usr/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:     return self.cursor.execute(sql, params)
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: psycopg2.errors.DeadlockDetected: deadlock detected
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: DETAIL:  Process 302931 waits for AccessExclusiveLock on relation 1335880 of database 23875; blocked by process 294471.
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: Process 294471 waits for AccessShareLock on relation 24124 of database 23875; blocked by process 302931.
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: HINT:  See server log for query details.
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: The above exception was the direct cause of the following exception:
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: Traceback (most recent call last):
2022-12-02 11:07:12 [INFO  ] [configure] /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns:   File "/usr/bin/pulpcore-manager", line 33, in <module>

Which was after the redis6 problem and may have had to do with me faffing about fiddling with services after that. (Which wasn’t strictly in the context of foreman-installer). So I’m pretty confident this is the thing that actually caused the error I referenced at the top of the post - and it seems quite possible that the ANSIBLE* errors I ran into otherwise were red herrings because I wasn’t being super-careful to run pulpcore-manager the same way the puppetcode runs it. When I sort of lucked into running it so it could see the settings (but with the rest of the foreman/katello services shut down) it ran ok. I saw the locking issues on a manual run, shut the other services down with foreman-maintain, and then started just postgres and redis back up, ran the migration, and from there on all was well (and continues to be).

TL;DR I am coming to the conclusion that I reacted incorrectly to the initial failure (which was definitely due to not being able to upgrade redis during the foreman-installer run) and attributed the error to the wrong cause (not being able to see the settings file) since that’s what I saw immediately when I ran the command manually. Once I ran the pulp migration manually (with foreman services shut down, and with the redis6 repo/module visible) the installation proceeded without issue.

I wonder if there should be a special upgrade note that since the redis6 upgrade happens when -installer runs, that this needs to be considered for self-subscribed installations, that might be enough. (Since I normally run with the upstream repos disabled, including baseos and appstream for cenos 8 stream itself).