Feedback for Foreman 3.11 & Katello 4.13

  • All the RejectedExecutionException errors should be benign; they happen when tomcat is being shut down.
  • Both the candlepin.log and the catalina.log seem to end prematurely. I would expect further messages for both of those.
    Can you check the /var/log/tomcat/localhost.log (not catalina)

Hmm okay, same as earlier, there is no new log in localhost.log since a long time, the latest message is still the one from hours before.

To add on that, I have been watching the whole /var/log/tomcat/*.log until now, all messages seen from that directory until now (even until now), are in the last messages above. Same for /var/log/candlepin/*.log
As well as foreman-tail, where I picked out the important messages, because there are also a few clients talking.

tbh, it looks like Tomcat fully starts up without error and also starts at least part of the candlepin application, but just halts somewhere then.
Also took a look about the port 61613, looks like that never comes up, at least also now after the hours there is still nothing listening there.

Hi all !

I found this post while trying to figure out why I can’t upgrade to Foreman 3.11.1/Katello 4.13.1 on a RockyLinux 8 server.

I was able to migrate another server to Foreman 3.11.1/Katello 4.13 .0 few days before but here I was stuck with a lot of errors during foreman-installer upgrade phase (errors 500, connection to database lost, huge stack traces in /usr/share/gems/gems/stomp-1.4.10/lib/connection/netio.rb …) and the whole process was very very slow until it failed

Will digging into the katello.log upgrade file, I saw a lot of errors about Katello::Errors::CandlepinNotRunning and Connection refused - connect(2) for "localhost" port 61613 . Netstat confirm the port is not open unlike the other server that have the port 61613 hold by the tomcat process.

I then checked catalina log and did not see any error at all, just a few lines at each startup/restart of tomcat:

26-Jul-2024 21:51:21.110 WARNING [main] org.apache.tomcat.util.net.SSLUtilBase.getEnabled Tomcat interprets the [ciphers] attribute in a manner consistent with the latest OpenSSL development branch. Some of the specified [ciphers] are not supported by the configured SSL engine for this connector (which may use JSSE or an older OpenSSL version) and have been skipped: [[TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256]]
26-Jul-2024 21:51:22.272 INFO [main] org.apache.tomcat.util.net.AbstractEndpoint.logCertificate Connector [https-jsse-nio-127.0.0.1-23443], TLS virtual host [_default_], certificate type [UNDEFINED] configured from keystore [/etc/candlepin/certs/keystore] using alias [tomcat] with trust store [/etc/candlepin/certs/truststore]
26-Jul-2024 21:51:22.304 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [3820] milliseconds
26-Jul-2024 21:51:22.432 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
26-Jul-2024 21:51:22.432 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.87]
26-Jul-2024 21:51:22.462 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/var/lib/tomcat/webapps/candlepin]
26-Jul-2024 21:51:28.104 INFO [main] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
26-Jul-2024 21:59:59.650 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent The Apache Tomcat Native library which allows using OpenSSL was not found on the java.library.path: [/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib]

On the other server, after the same lines there was a bunch of liquibase logs that were not present here…

I checked candlepin packages on both servers when I see @evgeni talking about the 4.4.10 version and @lumarel stating that it failed again with 4.4.12:

  • faulty server has candlepin-4.4.12-1.el8.noarch / candlepin-selinux-4.4.12-1.el8.noarch
  • working server is still on candlepin-selinux-4.4.10-1.el8.noarch / candlepin-4.4.10-1.el8.noarch

After downgrading candlepin from 4.4.12 to 4.4 .10 (dnf downgrade candlepin candlepin-selinux) the tomcat service started correctly and produced the liquibase logs:

26-Jul-2024 22:06:51.318 WARNING [main] org.apache.tomcat.util.net.SSLUtilBase.getEnabled Tomcat interprets the [ciphers] attribute in a manner consistent with the latest OpenSSL development branch. Some of the specified [ciphers] are not supported by the configured SSL engine for this connector (which may use JSSE or an older OpenSSL version) and have been skipped: [[TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256]]
26-Jul-2024 22:06:52.536 INFO [main] org.apache.tomcat.util.net.AbstractEndpoint.logCertificate Connector [https-jsse-nio-127.0.0.1-23443], TLS virtual host [_default_], certificate type [UNDEFINED] configured from keystore [/etc/candlepin/certs/keystore] using alias [tomcat] with trust store [/etc/candlepin/certs/truststore]
26-Jul-2024 22:06:52.567 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [4054] milliseconds
26-Jul-2024 22:06:52.708 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
26-Jul-2024 22:06:52.709 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.87]
26-Jul-2024 22:06:52.794 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/var/lib/tomcat/webapps/candlepin]
26-Jul-2024 22:06:58.788 INFO [main] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
26-Jul-2024 22:07:02.307 INFO [main] liquibase.database.null Set default schema name to public
26-Jul-2024 22:07:02.476 INFO [main] liquibase.changelog.null Reading from public.databasechangelog
26-Jul-2024 22:07:04.710 INFO [main] liquibase.lockservice.null Successfully acquired change log lock
26-Jul-2024 22:07:04.712 INFO [main] liquibase.command.null Using deploymentId: 2024424712
26-Jul-2024 22:07:04.715 INFO [main] liquibase.changelog.null Reading from public.databasechangelog
26-Jul-2024 22:07:04.783 INFO [main] liquibase.changelog.null Columns anonymous(boolean),claimed(boolean) added to cp_owner
26-Jul-2024 22:07:04.787 INFO [main] liquibase.changelog.null ChangeSet db/changelog/20230725171517-add-anonymous-claimed_column.xml::20230725171517-1::sbakaj ran successfully in 13ms
26-Jul-2024 22:07:04.811 INFO [main] liquibase.snapshot.null Creating snapshot
26-Jul-2024 22:07:04.867 INFO [main] liquibase.changelog.null Table cp_anonymous_cloud_consumers created
[... redacted for readability ...]
26-Jul-2024 22:07:07.922 INFO [main] liquibase.util.null UPDATE SUMMARY
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null Run:                         33
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null Previously run:             611
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null Filtered out:               207
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null -------------------------------
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null Total change sets:          851
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null FILTERED CHANGE SETS SUMMARY
26-Jul-2024 22:07:07.923 INFO [main] liquibase.util.null DBMS mismatch:              207
26-Jul-2024 22:07:07.931 INFO [main] liquibase.util.null Update summary generated
26-Jul-2024 22:07:07.938 INFO [main] liquibase.command.null Update command completed successfully.
26-Jul-2024 22:07:07.942 INFO [main] liquibase.lockservice.null Successfully released change log lock
26-Jul-2024 22:07:07.943 INFO [main] liquibase.command.null Command execution complete

Also, the production.log file was not flooded with errors anymore !
I then restarted foreman-installer to finally complete correctly the upgrade and now my Foreman instance seems fine and running as intended…
I will do some checks to see if it is stable enough or if I need to revert the VM snapshot to get back to Foreman 3.10/Katello 4.12.

Hope this will help find out what happened between 4.4.10 and 4.4.12 !

Best Regards,
Nicolas.

1 Like

I can confirm that candlepin-4.4.12 causes serious issues with the whole foreman/katello system including issues during upgrade to 3.11/4.13. We have ran into the same issue in Upgrade from 3.10.0/4.12.1 to 3.11.1/4.13.1 makes foreman unusable. Downgrading candlepin and restarting tomcat seems to solve the problems.

I have created 2300323 – candlepin-4.4.12-1 DB upgrade gets stuck to track this issue

1 Like

Hi folks, we’re having trouble reproducing the issue. We believe the state of the DB has something to do with the issue. Can someone provide a DB dump to help us reproduce?

If only the dump of the candlepin db is necessary, I can send that to you right away :+1: (sent)

1 Like

Even with the db dump, we are unable to reproduce the issue locally. So I’m at a loss for what to do. I noticed one of the other people that reproduced saw a change in candlepin.conf when upgrading (here: Foreman 3.11 upgrade - #2 by lumarel). Could you try to upgrade again, and check for any abnormalities/changes in candlepin.conf?

I have checked on my upgraded snapshot. candlepin.conf looks fine to me. It’s identical except for a few blank lines and this addition:

> candlepin.db.database_manage_on_startup=Manage

I have got everything stopped on the snapshot except postgresql and then started tomcat.service. Candlepin is still stuck, doesn’t respond on port 23443. The last line logged from tomcat.service is

org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs

(which is always there, so not a problem) With the previous candlepin version, it continued with

liquibase.database.null Set default schema name to public

The connection to the database is active, but according to pg_stat_activity, idle in transaction with select setting from pg_settings where name = 'edb_redwood_date'. I think that means that is the last query executed and now it’s waiting for the next but nothings coming.

After removing the line

candlepin.db.database_manage_on_startup=Manage

from candlepin.conf, candlepin starts again and connections to port 23443 are possible again. I don’t have time to test further but at the moment it looks to me as if that added line could be the culprit…

2 Likes

Interesting. This was added here: Support Candlepin 4.4 by ehelms · Pull Request #259 · theforeman/puppet-candlepin · GitHub

I too didn’t really see any differences in the /etc/candlepin and /var/lib/candlepin directories,
so seems like doing the candlepin 4.4.10 → 4.4.12 update doesn’t touch the /etc/candlepin/candlepin.conf file.

But removing the line has definitely a positive effect, don’t see any errors after commenting that out.
Looks like 4.4.10 treats that line differently than 4.4.12.

This definitely points us in a specific direction, and from the logs we’ve seen while digging into it a bit here I have a good guess as to what’s going wrong, but I do want to clarify what this particular configuration is:

This controls, as the name implies, if or how Candlepin will manage its database schema. When set to “manage”, “halt”, or “report”, Candlepin invokes our schema management library (Liquibase at the moment) to check if there are changesets present that haven’t been run, and, depending on that setting, run the change sets, halt Candlepin with an error message, or issue a warning and continue on.

If set to “None”, or the setting is removed entirely as is the case here, Candlepin will skip all of that and just assume the DB schema is being managed externally and go on its way without ever invoking Liquibase at all. This is fine for a very short-term workaround, but as soon as we make backward-incompatible schema changes, things will get very crashy, very quickly.

Now, given what we’ve seen from logs shared by some of y’all hitting this and the fact that shutting off Liquibase checks and management gets around the issue, we can make a pretty strong assertion that something is going sideways during the step where Liquibase checks what changes have been run.

One thing that might be helpful here is manually invoking the Liquibase upgrade and seeing if that kicks out any error that may be getting suppressed or otherwise lost when Candlepin is invoking it. I’ll work with our Katello friends to see if/how we can go about doing that from a proper Foreman environment and update the thread accordingly.

3 Likes

Sounds like the old cpdb command is still around to manually invoke the schema update. Those of you hitting this issue, try running the following command and see if it halts/hangs in the same way or, better yet, errors out:

sudo /usr/share/candlepin/cpdb --update-schema

Thanks for the explenation!
With that knowledge, I tried to set it to report, but looks like it also gets stuck there,
definitely seeing the org.candlepin.database.MigrationManager - Liquibase startup management set to Report line though in the candlepin log. (same with Halt)
Though it’s the last message that pops up there.
When setting it to None it continues, as when completely removed :+1:

Hmm already waiting for 6min+, that’s all output yet:

# sudo /usr/share/candlepin/cpdb --update-schema
Configuring PostgreSQL with JDBC URL: jdbc:postgresql:candlepin
Updating Candlepin schema
--driver=org.postgresql.Driver --classpath=/var/lib/tomcat/webapps/candlepin/WEB-INF/lib/postgresql-42.7.3.jar:/var/lib/tomcat/webapps/candlepin/WEB-INF/classes/ --changelog-file=db/changelog/changelog-update.xml --url="jdbc:postgresql:candlepin" --username=$DBUSERNAME             --headless=true --log-level=severe

Didn’t see any log in the tomcat and candlepin directories.
Not sure if it’s doing anything.

Update, still running after 10min+, think it’s stuck.

Additionally have to say, I reverted the config to =Managed before running the schema update :+1:
Will try now with none.

Hrmm… that’s not good. But at least we can reproduce without having to reboot Candlepin.

Though, saying that, if you were to stop tomcat and run that command with --verbose, does it output anything of value or get further along?

Also, what version of postgresql is present on these systems? Specifically, the postgresql server.