Foreman 3.3 -> 3.4 upgrade - success, unknown records in database

ikonia · September 15, 2022, 3:44pm

Problem:
upgrade from 3.3 to 3.4 successful, but leaving unknown records in database.

Unknown records found in the database

Please contact your administrator, the request id for finding the logs: “b13ee575-0180-4232-b8e9-16ee2097e43f”

no errors in database clean up tasks.
incompatible upgrade with expire_hosts plugin, plugin removed and upgrade successful

Expected outcome:

successful upgrade - list of hosts and their status similar to pre-upgrade

Foreman and Proxy versions:

version 3.3 to 3.4 - error in 3.4

Foreman and Proxy plugin versions:

foreman-proxy-3.4.0-1.el8.noarch
foreman-3.4.0-1.el8.noarch

Distribution and version:
CentOS 8 stream
centos-stream-release-8.6-1.el8.noarch

Other relevant data:
upgrade was clean - had an initial error due to the expire_host plugin, this plugin was legacy for me and no longer needed so I removed it and the upgrade was successful with no errors or warnings.

ikonia · September 17, 2022, 7:54pm

does anyone know where this identifier should theoretically be in the foreman database ? I can’t find any identifier that look like that, I’ve been looking in the host table as a starting point.

ikonia · September 17, 2022, 8:22pm

I did pg_dump and grepped for that string, it’s not in the database, so I have no idea why foreman thinks that uuid is in the database

gvde · September 17, 2022, 9:14pm

It‘s a little bit difficult to understand if you don‘t say what you do exactly to get this error and where you find it.

As the error says: more information should be in the logs from what you did there, depending only what that was…

ikonia · September 18, 2022, 1:02pm

sorry - this is on the default host page, I can’t list any hosts, as any menu/dashboard that displays a host errors with this message, hence why I thought that this entry would be in the hosts table (uuid column) as it appears anything that queries then displays hosts fails due to this UUID type identifier

for example hosts → all hosts generates this error.

it’s worth pointing out now that I’ve spent a little more time on it, the uuid it’s referencing is changing, so this time it was c0e6ce88-5d36-43e6-a3ad-76ed578501c8

interestingly (my postgres isn’t strong enough to understand this behaviour)

if I run select name,uuid from hosts to list the hosts on my test instance I get the friendly name and the uuid, only 5 hosts have a uuid

     name           |                 uuid                 
-------------------------±-------------------------------------
el9.no-dns.co.uk archer.no-dns.co.uk sonofanton.no-dns.co.uk wopr.no-dns.co.uk wesley.no-dns.co.uk mcp.no-dns.co.uk anton.no-dns.co.uk dnstest.no-dns.co.uk nog.no-dns.co.uk trip.no-dns.co.uk wptest1.no-dns.co.uk picard.no-dns.co.uk tain.no-dns.co.uk router.no-dns.co.uk lcars.no-dns.co.uk spock.no-dns.co.uk riker.no-dns.co.uk ezri.no-dns.co.uk odo.no-dns.co.uk garak.no-dns.co.uk tpol.no-dns.co.uk locutus.no-dns.co.uk hole.no-dns.co.uk kirk.no-dns.co.uk dukat.no-dns.co.uk jake.no-dns.co.uk jarvis.no-dns.co.uk | fdf88dd2-ab5a-4161-ab79-9d12c64eb8a6
| 0e9b4946-3933-4976-84d8-d7a82805177f
|
|
|
|
|
| 629db89b-efbf-4a69-8ba5-0f0daf3ab3a4
|
|
| d976a6db-d61d-4d49-bd92-b073ce8ae5ed
|
|
|
|
|
|
|
|
|
|
|
| 103d88cf-6058-433e-8929-a909d22e2157
|
|
|
|

this seems odd in itself, and as you can see the records in the two example error messages are not in this list

if I run the same query a second time, seconds apart, the ordering changes

     name           |                 uuid                 
-------------------------±-------------------------------------
archer.no-dns.co.uk | 0e9b4946-3933-4976-84d8-d7a82805177f
sonofanton.no-dns.co.uk |
wopr.no-dns.co.uk |
wesley.no-dns.co.uk |
mcp.no-dns.co.uk |
anton.no-dns.co.uk |
dnstest.no-dns.co.uk | 629db89b-efbf-4a69-8ba5-0f0daf3ab3a4
nog.no-dns.co.uk |
trip.no-dns.co.uk |
wptest1.no-dns.co.uk | d976a6db-d61d-4d49-bd92-b073ce8ae5ed
picard.no-dns.co.uk |
tain.no-dns.co.uk |
router.no-dns.co.uk |
lcars.no-dns.co.uk |
spock.no-dns.co.uk |
riker.no-dns.co.uk |
ezri.no-dns.co.uk |
odo.no-dns.co.uk |
garak.no-dns.co.uk |
tpol.no-dns.co.uk |
locutus.no-dns.co.uk |
hole.no-dns.co.uk | 103d88cf-6058-433e-8929-a909d22e2157
el9.no-dns.co.uk | fdf88dd2-ab5a-4161-ab79-9d12c64eb8a6
kirk.no-dns.co.uk |
dukat.no-dns.co.uk |
jake.no-dns.co.uk |
jarvis.no-dns.co.uk |
(27 rows)

I’m assuming (researching at the same time) that postgres will display random orders unless I issue an orderby/sortby type statement in the query.

odd though that only 5 have uuid’s and of those uuid’s none of them are the ones the error message is complaining about

ikonia · September 18, 2022, 1:13pm

two points of interest in the log.

The first is with no-gui interaction the production log displays these lines

/usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context’
2022-09-18T14:05:41 [I|app|622bdfa9] Rendered api/v2/errors/standard_error.json.rabl within api/v2/layouts/error_layout (Duration: 6.4ms | Allocations: 5744)
2022-09-18T14:05:41 [I|app|622bdfa9] Rendered layout api/v2/layouts/error_layout.json.erb (Duration: 12.7ms | Allocations: 11430)
2022-09-18T14:05:41 [I|app|622bdfa9] Completed 500 Internal Server Error in 80ms (Views: 14.1ms | ActiveRecord: 15.5ms | Allocations: 21936)

I’m not sure what’s calling this function as it looks like an interface function - but I’m the only one using this test instance and I’ve not actually clicked anything

actually clicking the hosts-> all hosts generates this in the log

2022-09-18T14:09:47 [I|app|00752c3a] Rendered common/confirm_class_clean_up.html.erb within layouts/application (Duration: 4.0ms | Allocations: 5734)
2022-09-18T14:09:47 [I|app|00752c3a] Rendered layouts/base.html.erb (Duration: 34.9ms | Allocations: 28424)
2022-09-18T14:09:47 [I|app|00752c3a] Rendered layout layouts/application.html.erb (Duration: 50.7ms | Allocations: 46951)
2022-09-18T14:09:47 [I|app|00752c3a] Completed 200 OK in 113ms (Views: 49.6ms | ActiveRecord: 15.2ms | Allocations: 67612)
2022-09-18T14:09:47 [I|app|5b034db5] Started GET “/notification_recipients” for 10.11.216.109 at 2022-09-18 14:09:47 +0100
2022-09-18T14:09:47 [I|app|5b034db5] Processing by NotificationRecipientsController#index as JSON
2022-09-18T14:09:47 [W|tax|5b034db5] Current user is set, but not expected. Clearing
2022-09-18T14:09:47 [I|app|5b034db5] Completed 200 OK in 13ms (Views: 0.1ms | ActiveRecord: 2.1ms | Allocations: 4221)
2022-09-18T14:09:57 [I|app|a880b082] Started GET “/notification_recipients” for 10.11.216.109 at 2022-09-18 14:09:57 +0100
2022-09-18T14:09:57 [I|app|a880b082] Processing by NotificationRecipientsController#index as JSON
2022-09-18T14:09:57 [I|app|a880b082] Completed 200 OK in 14ms (Views: 0.1ms | ActiveRecord: 2.7ms | Allocations: 4177)]]

verified with a second request

2022-09-18T14:11:31 [I|app|02ff4ba5] Rendered common/confirm_class_clean_up.html.erb within layouts/application (Duration: 4.5ms | Allocations: 5800)
2022-09-18T14:11:31 [I|app|02ff4ba5] Rendered layouts/base.html.erb (Duration: 35.9ms | Allocations: 28423)
2022-09-18T14:11:31 [I|app|02ff4ba5] Rendered layout layouts/application.html.erb (Duration: 62.9ms | Allocations: 47016)
2022-09-18T14:11:31 [I|app|02ff4ba5] Completed 200 OK in 129ms (Views: 61.7ms | ActiveRecord: 15.8ms | Allocations: 67673)
2022-09-18T14:11:31 [I|app|2292ab2a] Started GET “/notification_recipients” for 10.11.216.109 at 2022-09-18 14:11:31 +0100
2022-09-18T14:11:31 [I|app|2292ab2a] Processing by NotificationRecipientsController#index as JSON
2022-09-18T14:11:31 [I|app|2292ab2a] Completed 200 OK in 13ms (Views: 0.1ms | ActiveRecord: 2.2ms | Allocations: 4193)

which looks ok - the .109 IP is my desktop I’m working from, so without upping the debug level, as far as foreman is concerned my request was ‘ok’ yet the error is as above in the original message

gvde · September 18, 2022, 2:23pm

It’s a request id. It’s different each time. Search the logs for c0e6ce88. I think the logs only show the first four bytes of the uuid for brevity.

ikonia · September 18, 2022, 3:25pm

I’d not read it like that, which now makes sense why it’s changing, I’d read it as request for ID XXXX not request ID = XXXX hence why I was trying to make it to some sort of UUID

thanks for that, really useful

ikonia · September 18, 2022, 3:30pm

this makes more sense now,

I get the full ‘clean’ snap of the log with no noise from other requests in between it

production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Started GET “/hosts” for 10.11.216.109 at 2022-09-18 16:28:18 +0100
production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Processing by HostsController#index as HTML
production.log:2022-09-18T16:28:18 [W|app|52b0bde3] Action failed
production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Backtrace for ‘Action failed’ error (ActiveRecord::SubclassNotFound): The single-table inheritance mechanism failed to locate the subclass: ‘HostStatus::ExpirationStatus’. This error is raised because the column ‘type’ is reserved for storing the class in case of inheritance. Please rename this column if you didn’t intend it to be used for storing the inheritance class or overwrite HostStatus::Status.inheritance_column to use another column for that information.
production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Rendered common/confirm_class_clean_up.html.erb within layouts/application (Duration: 6.6ms | Allocations: 5801)
production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Rendered layouts/base.html.erb (Duration: 65.3ms | Allocations: 28432)
production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Rendered layout layouts/application.html.erb (Duration: 91.2ms | Allocations: 47031)
production.log:2022-09-18T16:28:18 [I|app|52b0bde3] Completed 200 OK in 200ms (Views: 89.1ms | ActiveRecord: 26.1ms | Allocations: 67733)

the expire status failing is because I removed the expire host plugin as that was failing in the upgrade process and I never use the feature so it made more sense to remove something I no longer needed, I didn’t realise it would have this impact.

I guess the real question then is how to clean this up, I don’t believe just putting the expire host plugin back will work - and if I don’t need it, that seems the wrong approach.

bmdennis · September 18, 2022, 10:58pm

I ran into this problem, and also don’t use the expire hosts plugin, so I was able to fix this on my install by removing all records with type = ‘HostStatus::ExpirationStatus’ from the host_status table in the foreman postgres database:

# Switch to postgres user
su - postgres
# Connect to foreman database
psql foreman
# Remove all host expiration entries from host_status
delete from host_status where type = 'HostStatus::ExpirationStatus';

Everything looks to be working okay so far after doing so.

ikonia · September 19, 2022, 5:05am

this looks really good, I’d just started to explore the host_status table, this appears to have saved me a lot of effort. Thanks a lot.