Luna-nightly-rpm-pipeline 332 failed

Luna nightly pipeline failed:

https://ci.theforeman.org/job/luna-nightly-rpm-pipeline/332/

foreman-pipeline-luna-nightly-centos7-install (passed) (remote job)
foreman-pipeline-luna-nightly-centos8-stream-install (passed) (remote job)
foreman-pipeline-luna-nightly-centos8-stream-upgrade (failed) (remote job)
foreman-pipeline-luna-nightly-centos7-upgrade (passed) (remote job)

@Marek_Hulan @lzap I think this failure is the same as we discussed in Katello-nightly-rpm-pipeline 1220 failed but now we’re seeing Katello pass. I wonder if @Marek_Hulan was right and there is some ordering involved in reproducing the failure.

Is there a chance that Katello creates the OS during the repo sync in some kind of a transaction while we also receive facts from Puppet? Is there a way to keep that instance and further investigate? Do we have production.log with SQL logs enabled from such tests? That could better reveal what’s going on.

I’m spinning up a setup on my own machine. Using forklift you can run:

LANG=en_US.UTF-8 ansible-playbook pipelines/upgrade_pipeline.yml -e forklift_state=up -e pipeline_os=centos8-stream -e pipeline_type=luna -e pipeline_version=nightly

Not right now because we use the default logging levels. I’m sure we could enable that, but I’m first waiting for the local installation so I can poke around in the DB and hopefully reproduce it.

Today I spun up 3 pipelines on my machine but none of them reproduced the failure. If we continue seeing this, perhaps we can modify the pipeline to enable SQL logs. @evgeni any other ideas?

I do not, no.

I think SQL logging might be a good idea for the pipelines either way, what would we need to modify for that?

@Justin_Sherrill provided the key insight here.

First of all, the upgrade pipeline starts at Foreman 3.0 where Puppet was enabled by default. This means the agent checks in and creates the OS.

Then if we look at what Foreman 3.1 (and 3.0) create:

id | major |  name  | minor | nameindicator |         created_at         |         updated_at         | release_name |  type  |   description   | password_hash |      title
----+-------+--------+-------+---------------+----------------------------+----------------------------+--------------+--------+-----------------+---------------+-----------------
1 | 8     | CentOS |       |               | 2022-02-07 18:05:55.944924 | 2022-02-07 18:05:55.944924 |              | Redhat | CentOS Stream 8 | SHA256        | CentOS Stream 8

Then compare this to Foreman nightly:

id | major |     name      | minor | nameindicator |         created_at         |         updated_at         | release_name |  type  |   description   | password_hash |      title
----+-------+---------------+-------+---------------+----------------------------+----------------------------+--------------+--------+-----------------+---------------+-----------------
1 | 8     | CentOS_Stream |       |               | 2022-02-07 17:30:00.279439 | 2022-02-07 17:30:00.279439 |              | Redhat | CentOS Stream 8 | SHA256        | CentOS Stream 8

And if we closely look at the error:

error (ActiveRecord::RecordInvalid): Validation failed: Description has already been taken, Title has already been taken

We can indeed see that the description and title have been taken, but the name is certainly different.

My theory why it sometimes showed up (mostly in CI) is that it has to with the speed of the machine and that Puppet runs on a timer. Every 30 minutes with a certain splay means we have our variation.

I’ve opened a PR to revert the change to allow us to safely branch Foreman 3.2 this week:

To add to this, i was able to reproduce by hand:

  1. install katello 4.3 (foreman 3.1)
  2. yum install /var/www/html/pub/katello-ca-consumer-latest.noarch.rpm
  3. subscription-manager register
  4. foreman-installer --enable-foreman-plugin-puppet --enable-foreman-cli-puppet --foreman-proxy-puppet true --foreman-proxy-puppetca true --foreman-proxy-content-puppet true --enable-puppet --puppet-server true --puppet-server-foreman-ssl-ca /etc/pki/katello/puppet/puppet_client_ca.crt --puppet-server-foreman-ssl-cert /etc/pki/katello/puppet/puppet_client.crt --puppet-server-foreman-ssl-key /etc/pki/katello/puppet/puppet_client.key --disable-system-checks
  5. puppet agent -t -v
  6. upgrade to nightly for katello & foreman
  7. puppet agent -t -v

I was looking into this all evening and I am unable to reproduce this with facter 3.14, it never sends any “Stream” word at all. So question is: what creates a title “CentOS 8 Stream”? It’s not puppet. It must be RHSM:

[root@s17 ~]# facter | grep Stream

[root@s17 ~]# subscription-manager facts | grep Stream
distribution.name: CentOS Stream

However, when I try to do this (I clean all OSes and call subscription-manager facts --update) I never get CentOS Stream 8 I always get CentOS_Stream 8.

So I am not sure what is causing this.

I need to try tomorrow installing Katello 4.3 on CentOS Stream I guess.

Can you specify more details on the reproducer Justin? Install Katello 4.3 but on which platform. Does this matter? Were you installing Katello or Foreman, the installer command looks like Foreman. Do you run puppet agent -t -v on the system itself or on a Stream client?

I tried again this morning, clean Foreman 3.1, puppet, I install a Stream 8 client with puppet, configure it with the server, sign the certificate and a new OS is created: CentOS 8. After upgrade to nightly, proper new OS is created:

2022-02-10T08:37:24 [I|app|e7507d1c] Import facts for 's17.nuc.local' completed. Added: 0, Updated: 6, Deleted 0 facts
2022-02-10T08:37:24 [I|app|e7507d1c] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on major 8
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on name CentOS_Stream
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on minor
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on nameindicator
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on release_name
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on description
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on password_hash SHA256
2022-02-10T08:37:24 [I|aud|e7507d1c] Operatingsystem (4) create event on title CentOS_Stream 8
2022-02-10T08:37:24 [I|aud|e7507d1c] Host::Base (12) update event on operatingsystem_id 3, 4

Unless I get more reliable reproducer, I am not really sure how can I help.

And @ezr-ondrej figured it out, you need to install redhat-lsb-core on the client in order for facter to report the offending “Stream”. Then I can see it:

2022-02-10T08:42:22 [I|app|e731e4d2] Import facts for 's17.nuc.local' completed. Added: 14, Updated: 9, Deleted 0 facts
2022-02-10T08:42:22 [I|app|e731e4d2] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2022-02-10T08:42:22 [I|aud|e731e4d2] Operatingsystem (4) update event on description , CentOS Stream 8
2022-02-10T08:42:22 [I|aud|e731e4d2] Operatingsystem (4) update event on title CentOS_Stream 8, CentOS Stream 8

My final conclusion: we need to rename OS name from CentOS to CentOS_Stream for all OSes that have “CentOS Stream” or “CentOS_Stream” in title or description.

The reason is stated in the migration itself:

      # When redhat-lsb-core package is installed, puppet creates
      # description/title "CentOS Stream 8", however, when the package is not
      # present description is unset and title is set to "CentOS_Stream 8" from
      # the OS name and major by the ActiveRecord callback. Let's migrate both
      # cases.