Proposal: Merging foreman pipelines with pipelines of other plugins

The current release pipelines are problematic for a few reasons:

  • Katello Nightly is about the least stable of all environments as foreman releases (or other plugins) can get pushed to nightly which can break katello
  • Foreman ships the katello installer bits as part of its repos, so these get shipped out separately (and ahead of) katello.

I’d like to propose that we combine the foreman and katello pipelines into one. This would bring in other plugins such as REX into the main release pipeline. I think its also fine to add additional plugins to the test as well (like the luna pipeline).

I can picture the pipeline doing:

  • standalone foreman install & upgrade tests
  • katello or luna installation & upgrade tests with wider plugins selected

The benefits i see:

  • Better stability across the ecosystem, in nightly and releases
  • the pipelines gate releases, not possible to release a foreman release that is incompatible with current plugin
  • foreman & katello are released together on the same date, at the same time

The downsides:

  • Longer pipeline runs
  • Large changes, like ruby or rails upgrades would require all pipeline included plugins to be updated before any change lands in nightly (this could be seen as benefit)
  • Requires more coordination between katello and foreman for release dates and z-releases. (this could be seen as benefit).
    • Foreman release could be delayed by some bug/issue with a selected plugin (but if these are running nightly together, ideally we would have already caught it).

I’m sure I am missing other downsides or challenges, but i’m eager for discussion :slight_smile:

Justin

1 Like

I’m wondering about this and whether it’s going too far or not far enough.

First of all, I think the model 1 (yum) repository is one pipeline is a good one. Tying multiple ones together can make it impossible to easily resolve dependency conflicts.

This is only partly related, but I’ve been mentioning this a few times. I think we should drop the Katello version number from the (yum) repositories. That means we no longer say katello/4.3 but rather use the correct Foreman version number (which is 3.1 I think). The benefit to users is that they no longer need to map the version together. On the packaging side branching also becomes easier.

That does raise the question: would it make sense to drop the Katello repository altogether and move to a single repository. That would also imply a single pipeline makes much more sense. To do so, we would need to move at least a few plugins to the main repository which could suggest to drop the plugins repository altogether. The downside of that is that any cherry pick of a plugin to a stable branch needs the full release pipeline and it would need to be signed. That in turn would solve Feature #4788: Plugin rpms not signed - Packaging - Foreman but bring an additional overhead to cherry picking.

I also raised this in a private discussion yesterday: today we have a release owner for Foreman and a release owner for Katello. However, could we end up with a single release owner who takes care of an entire release? We already have an installer maintainer taking care of the installer, rely on core maintainers to care for Foreman itself. Smart Proxy rarely changes these days, but you could have another group responsible for that. Then there are various plugin maintainers. Katello could be in that category.

On the other side: Foreman runs the Katello unit tests as well for every core PR but they break all the time and personally I’m ignoring them more often than not. We may end up in a similar situation where Katello is blocking things and nobody knows how to solve them. Timezones are an additional problem: most Katello developers are in the US and if you’re blocked until halfway through the day it quickly becomes very annoying.

So in short: I’m advocating for making Katello less special and more of a generic plugin. However, there are serious concerns over availability/reliability, both in people and technical.

I love all those ideas! Lets do those :slight_smile:

From my experience among the failures, i typically see about ~60% CI issues and ~40% random test failures. If we see random test failures, we typically try to fix them, but they still pop up from time to time. I’ve also noticed that sometimes the katello failures on the foreman PRs seem to be different than the failures on the katello prs (which is strange). Anyways, i think if we as katello developers take it even more seriously (investigate and fix each and every one), we can knock down that 40%. Lets start filling issues for breakages (or at least bringing them up on #theforeman-dev) and get them knocked out. As for the CI issues, that’s a bit out of my wheelhouse, maybe we can adjust some slave sizes, or reduce the number of simultaneous jobs?

I don’t fully have answers to this, as a global project we already have these issues. Having a bit of delay with stable nighties is better than moving faster with broken nightlies (and i get that foreman nightlies are stable in these cases).

I’m a bit reminded of the story around toyota vs the american automakers in the late 1980s, early 1990s. At the time American carmakers would fight hard to not stop the assembly line. Any issue that popped up would just be pushed on through and be fixed at the end. Cars came out of the factory half assembled and would have to be fixed. It was seen as an individual failure if the assembly line ‘stopped’ due to an issue at your station. As the Japanese automakers rose in popularity they used a different approach. Any issue on the line caused the line to immediately stop and the problem was seen as an opportunity for process improvement. Cars came out completed and done, and there were fewer one offs that had to be fixed after the fact. Because everyone was so focused on improving the process and quality, their quality levels far outpaced american cars at the time.

This isn’t a one to one match for our situation by any means (please don’t read too much into and try to align any particular aspect with our situation), but i think goes to what would help, which is a more singular focus around the nightlies.

During the release meeting today, we had a discussion about how to progress with the potential merging of Foreman + Plugin pipelines. One suggestion is to do a trial period of gating just the Foreman nightly pipeline on Katello nightly. The code change is reportedly small to have the Katello process move both Foreman+Katello RPMs from Koji, so the commitment would be small as well. The downside of this trial is that the Foreman and Katello pipelines wouldn’t be perfectly synced, so a broken Foreman could make it through (if I understood correctly @evgeni :slight_smile:). However, the trial wouldn’t be for numbered releases, so that shouldn’t be a problem.

With this trial, the Katello team can get experience with more urgently needing to take care of broken pipelines. We can also see if the longer build times are problematic.

You understood correctly. To elaborate a bit.

Today we have two absolutely distict flows:

  • generate a foreman repo snapshot from koji, test that snapshot, publish that snapshot
  • generate a katello repo snapshot from koji, test that snapshot, publish that snapshot

The cheap trick is to move all the publishing to the second flow, getting:

  • generate a foreman repo snapshot from koji, test that snapshot
  • generate a katello repo snapshot from koji, test that snapshot (plus the latest foreman one, as that’s a dependency), publish foreman and katello snapshots

The problem that I was describing is that when the second workflow publishes both snapshots, it doesn’t know whether the foreman snapshot it is publishing actually succeeded its CI (it probably did, as that’s what triggered the second flow in most cases, but there are exceptions).

The “real” solution would require either running both flows in parallel (thus pseudo-guaranteeing they both see the same snapshots), or stage the foreman result in an intermediate repo somehow. But we don’t have the infra for either today in place/capacity, so I’d aim at the “cheap” solution for now.

1 Like