Unsuccessful upgrade from F2.1+K3.16 to F2.2+K3.17, every time

Problem:

Every time I attempt to upgrade Foreman from 2.1 to 2.2 (and Katello from 3.16 to 3.17), it fails and results an an unusable stack.

Expected outcome:

A functional Foreman & Katello stack.

Foreman and Proxy versions:

Before:

$ rpm -q foreman katello
foreman-2.1.4-1.el7.noarch
katello-3.16.2-1.el7.noarch

After:

$ rpm -q foreman katello
foreman-2.2.3-1.el7.noarch
katello-3.17.3-1.el7.noarch

Foreman and Proxy plugin versions:

The standard set of plugins that come with the base install. I haven’t installed any plugins myself. Any dependencies came along for the ride during the upgrade.

However, I did notice that the remote execution plugin got installed as part of the upgrade (though I didn’t specify it). Is it a new dependency? Any time I’ve tried installing it in the past, I was unsuccessful, as it made my Foreman+Katello install unusable and I had to roll back.

Distribution and version:

CentOS Linux release 7.9.2009 (Core)

Other relevant data:

These are the errors summarized at the end of /var/log/foreman-installer/katello.log:

[ERROR 2021-04-27T16:52:59 main] Errors encountered during run:
[ERROR 2021-04-27T16:52:59 main] foreman-maintain packages is-locked --assumeyes failed! Check the output for error!
[ERROR 2021-04-27T16:52:59 main]  /Stage[main]/Pulpcore::Static/Pulpcore::Admin[collectstatic --noinput]/Exec[pulpcore-manager collectstatic --noinput]: Failed to call refresh: 'pulpcore-manager collectstatic --noinput' returned 1 instead of one of [0]
[ERROR 2021-04-27T16:52:59 main]  /Stage[main]/Pulpcore::Static/Pulpcore::Admin[collectstatic --noinput]/Exec[pulpcore-manager collectstatic --noinput]: 'pulpcore-manager collectstatic --noinput' returned 1 instead of one of [0]
[ERROR 2021-04-27T16:52:59 main]  Command exceeded timeout
[ERROR 2021-04-27T16:52:59 main]  /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]/returns: change from 'notrun' to ['0'] failed: Command exceeded timeout
[ERROR 2021-04-27T16:52:59 main]  /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]: Failed to call refresh: Command exceeded timeout
[ERROR 2021-04-27T16:52:59 main]  /Stage[main]/Pulpcore::Database/Pulpcore::Admin[migrate --noinput]/Exec[pulpcore-manager migrate --noinput]: Command exceeded timeout

…and these errors get repeated indefinitely in a loop in /var/log/foreman/production.log:

2021-04-28T18:17:04 [W|app|] Creating scope :completer_scope. Overwriting existing method Organization.completer_scope.
2021-04-28T18:17:05 [W|app|] Scoped order is ignored, it's forced to be batch order.
2021-04-28T18:17:05 [W|app|] Creating scope :completer_scope. Overwriting existing method Location.completer_scope.
2021-04-28T18:17:05 [W|app|] Could not create role 'Remote Execution Manager': ERF73-0602 [Foreman::PermissionMissingException]: some permissions were not found: ["view_audit_logs", "view_hosts", "view_smart_proxies", "view_job_templates", "create_job_templates", "edit_job_templates", "edit_remote_execution_features", "destroy_job_templates", "lock_job_templates", "create_job_invocations", "view_job_invocations", "create_template_invocations", "cancel_job_invocations", "filter_autocompletion_for_template_invocation", :view_job_templates, :view_job_invocations, :create_job_invocations, :create_template_invocations, :view_hosts, :view_smart_proxies, :cancel_job_invocations, :destroy_job_templates, :edit_job_templates, :create_job_templates, :lock_job_templates, :view_audit_logs, :filter_autocompletion_for_template_invocation, :edit_remote_execution_features]
2021-04-28T18:17:05 [E|app|] Cannot continue because some permissions were not found, please run rake db:seed and retry
2021-04-28T18:17:05 [I|app|] Rails cache backend: File

Does anyone have any insights that would help me figure-out what’s broken about my installation?

Did you try what the error message suggests?

Sorry, I forgot to mention that. I did run rake db:seed, but that didn’t make any difference. It just resulted in a different error, which I couldn’t get past. I’ll run through it again and post the results here, later.

Hi @pdelong42

Can you run the rake db:seed with the --trace option and post the output?

Did the db:migration work ok?

In a nutshell I get the “User with login admin already exists, not seeding as admin.” error from trying a db:seed. I also ran the db:migrate, and it seemed to run without any issues (since the first attempt produced no output, I ran it again with --trace and checked the exit code, FWIW). See the detailed output below:

$ cat try.sh
#!/bin/sh

# Usage: script -c 'sudo sh try.sh' $(date +%F_%T).log

set -xeuo pipefail

foreman-rake db:seed --trace
foreman-rake db:migrate

$ script -c 'sudo sh try.sh' $(date +%F_%T).log
Script started, file is 2021-04-29_11:40:35.log
+ foreman-rake db:seed --trace
** Invoke db:seed (first_time)
** Invoke db:load_config (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute db:load_config
** Execute db:seed
** Invoke db:abort_if_pending_migrations (first_time)
** Invoke db:load_config 
** Execute db:abort_if_pending_migrations
** Invoke dynflow:abort_if_pending_migrations (first_time)
** Invoke environment 
** Execute dynflow:abort_if_pending_migrations
User with login admin already exists, not seeding as admin.
+ foreman-rake db:migrate
Script done, file is 2021-04-29_11:40:35.log
$ sudo foreman-rake db:migrate --trace
** Invoke db:migrate (first_time)
** Invoke db:load_config (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute db:load_config
** Invoke plugin:refresh_migrations (first_time)
** Invoke environment 
** Execute plugin:refresh_migrations
** Execute db:migrate
** Invoke db:_dump (first_time)
** Execute db:_dump
** Invoke dynflow:migrate (first_time)
** Invoke environment
** Execute dynflow:migrate
$ echo $?
0

I did also find this related issue (which left me none the wiser):

…though isn’t entirely clear to me whether that fellow actually ever got over the hurdle he’s described.

What other helpful diagnostic data can I provide? I’m stopping short of just uploading a tarball of my /var/log directory, seeing as this is a public forum.

Also, would there be any merit in trying to upgrade directly to Foreman v2.3 and Katello v3.18 (on the off chance that the bugs I’m tripping over have been fixed in that version of the installer)? The phrasing used in the upgrade documentation seems a bit ambiguous on that count:

Katello supports upgrades from the previous two versions only. Upgrades should be performed sequentially without skipping versions in between.

Is it just me, or do those two sentences contradict each-other? (cf. Foreman :: Plugin Manuals)

Hi @pdelong42

Can you generate a foreman-debug and email it to me at chrobert @ redhat.com? I will take a look today.

Thanks @cintrix84 , I’ll be sending it to you shortly (in the next five minutes). I’ve included the output of foreman-debug runs for both before and after the upgrade (in case a wonky condition existed prior to upgrading).

1 Like

Hi @cintrix84 ,

Any luck? I hope the email found you well (I didn’t get a bounce, anyway).

I do appreciate you having a look, seeing as I’m on the free product.

Hi @pdelong42

Super sorry about the delay, I never got anything. That is why I had the delay on my end. I did not get anything in the spam, so it’s possible our email filter just dropped the email. I can give you another spot to upload it or if you want to upload it somewhere like Dropbox or something and send me the link that would work too.

Cool, I’m glad you were able to use the OneDrive link I just sent.

One of our mail servers must have nuked the email based on the attachment. Though I can’t imagine it was on the basis of size (20 MiB isn’t super large by today’s standards).

  • Paul

I think at this point I’m going to have to nuke the VM and reinstall from scratch, probably starting with Foreman 2.3 and Katello 3.18 (I’d go straight to 4.0, but I’m super wary of .0 releases).

The only thing stopping me from this drastic approach is that I’ve already gotten moderately invested in the current install. But there have been enough weird issues with my CentOS 8 repos that the pros might outweigh the cons at this point.

Also, I’m not sure I won’t run into these same issues the next time I attempt an upgrade. But I’m not sure what my alternatives are in any case (short of cherry-picking and rolling out the individual components myself, e.g., rolling my own Pulp3 server).

Are there any good ways to:

  • export (and later, re-import) repository configs, so I don’t need to manually cut-and-paste each one from the present configuration;
  • avoid needing to generate a new consumer certificate (and key), to avoid having to re-install it on all of the clients I’ve registered so-far?

(Is that latter item as simple as taking the cert and key files from their locations on the present server, and placing them into the same location on the new server?)

Hi Paul,

Sorry for the delay, I was not feeling well so took a few days off. I am back now and can look over the files you sent me if you want to keep the vm? If you want to start over you can just move your ssl certs over and create a new instance passing the certs as params during the installer run.

Let me know what you want to do, sorry again.

Sorry to hear, and I hope you’re feeling better.

If you are still able to look into the logs and see if there’s any smoking gun, then I’m certainly interested in knowing what you find. I haven’t yet wiped the VM and started the reinstall, so anything you find would potentially save me some trouble.

And thanks again for looking into this. You’re clearly going out of your way to help me here.

Apologies, I got a tad discouraged earlier this week when I tried to add the CentOS 8 Stream repos, and ran into an additional roadblock. Pulp seems to have trouble recognizing .xz files from the repo metadata, and I thought they may have fixed it in Pulp 3 (hence the thought of cutting my losses and just jumping to a later version). When I get some time this week or next, I’ll file that into their trouble tracking system, if there isn’t already an issue for it.

In any case, thanks for the info about the key-pair. I still have that in my back pocket as a contingency plan.

1 Like

Hi Paul,

Looking through the tar file you sent me I see a few things going on here:

It looks like upon the upgrade the Pulp3 migration tool was upset about the debian plugin and some permissions on the pulp directory:

PermissionError: [Errno 13] Permission denied: '/var/lib/pulp/assets/admin/css/autocomplete.css'

Is that set to pulp:pulp also try a # restorecon -Rv on it to see if something got messed up with SELinux

Are you using Debian files with your Katello instance?

I see in /etc/foreman-installer/scenarios.d/katello-answers.yml that it is enabled
enable_deb: true

I do see Pulp3 on as well in the same file:
pulpcore_enabled: true

If you are not we can verify there is not any Debian repos/files being used by doing the following:

# foreman-rake console
Katello::Deb.count
Katello::Repository.where(:root => Katello::RootRepository.where(:content_type => "deb")).count

If both of these are 0 then you can safely disable deb content in katello before migration by running foreman-installer --katello-enable-deb=false

Then we can try the upgrade again.

I see there was an issue filed for this as well:

https://projects.theforeman.org/issues/31875

Okay, I ran restorecon on that CSS file, but there’s no change. According to ls -Z, it had no context before and it still has none afterwards. Also, it’s owned by root:root. Should I change that to pulp:pulp?

Now, that leads to a larger can of worms, as it seems that everything in /var/lib/pulp/assets is owned by root:root. Should I chown -Rh pulp:pulp /var/lib/pulp/assets (rather than just that one CSS)?

As for Debian files: no, I’m not using any, and I can certainly afford to disable support for them (we’re pretty much a .rpm shop). Also, both of those counts return 0. So it’s looking like a safe bet to disable .deb content.

If not later this afternoon, then I’ll probably be attempting another upgrade early next week, using the arg you cited above, for foreman-installer (would there be any harm in also appending --katello-use-pulp-2-for-deb=false?).

Hi Chris,

I tried doing the upgrade again, with the permission changes to /var/lib/pulp/assets, and with Debian content support disabled, but it doesn’t look like it’s made much of a difference. I still end up with the same failure mode.

Here’s the script I use to automate as many steps as possible: go.sh.gz (613 Bytes)

Any pointers on where I should look next?

Could any specific repositories be throwing a spanner in the works? The fact that my CentOS 8 repos have never been stable seems like it could be a related issue. Periodically, I need to re-publish the content view containing my CentOS 8 repos, because the repomd.xml files from one of the repos (usually EPEL) will suddenly become inaccessible (from the point-of-view of a client, at any rate).

That’s just a hunch, though. I don’t have enough familiarity with the moving parts to know how to pinpoint the issue.

  • Paul

Do you have selinux disabled?

'tis not enabled:

$ sudo getenforce
Disabled