CI Speed Up Thread

This thread is dedicated to a round of discussion and improvement projects to speed up parts of our CI towards the goal of faster feedback to developers across various projects and parts of our testing ecosystem. With this thread I want to:

  • Bring together discussions, and ideas to a central place for further discussion and hopefully some implementation
  • Highlight work that anybody does that improves the speed of CI tests as these changes can pay for themselves time and time again through our developer workflows

I’d like to start by highlighting some discussion and work around speeding up NPM based actions.

First, thanks to @evgeni for noticing an ancient work around that when fixed resulted in shaving ~20-25 minutes off every Katello PR and source build job – use a normal version of grunt-bower-task, not a git checkout by evgeni · Pull Request #10158 · Katello/katello · GitHub

We have been adjusting the NPM jobs to display more output around where time is spent by splitting the dependency generation step (aka creation of package-lock.json) from the downloading of NPM modules step, that is running the following as two steps:

  • npm install --package-lock-only --no-audit
  • npm ci

So far we have opened PRs to do this for Foreman and Katello:

Additionally, using some techniques to speed up Debian NPM installs where plugins are involved, looking into the following starting with Katello:

So far these investigations are centered around Foreman and Katello as our most churned on and longest running CI jobs. I would like to ask if any plugin maintainers read this that they reach out to identify places they think we could apply similar logic in their test suites.

The “purge useless npm deps” PR drops another ~20 minutes from the Katello pipeline. Mainly because it’s the “assets precompile” step, which just needs build dependencies, but we were installing also all the test deps (and then multiplied for foreman, katello, rex, tasks, as katello pulls those in too)

It reminded me that I had opened a redmine back when I was working on the Debian speedups: Bug #33317: better differentiate between build, develop and test dependencies for JavaScript - Foreman

So let me bring that up again – is there a way we can better “categorize” the dependencies we (core and plugins) have? Or can we at least agree on a simple nomenclature given the current package.json constraints (there are only dependencies and devDependencies)? Something like dependencies is what we need for building the assets and devDependencies is what we need for development and tests? (I know, that’s not how Node defines dependencies, but we don’t really have node packages anyways here, we just abuse package.json as a way to express dependencies.)

This change has been merged! And I took it for a spin with this PR which ran this job and it ran in 30 minutes!

1 Like

Another offender is the sassc gem, which needs to compile libsass on every installation (see sassc is very slow to compile and install · Issue #189 · sass/sassc-ruby · GitHub for details), that takes 2-3 minutes on every run :frowning:.

I think we should be able to fix that by using project-specific, not project-build-specific RVM gemsets. That means that e.g. katello-pr-test would always run in the 2.7.0@katello-pr-test gemset, and not create a new 2.7.0@katello-pr-test-$JOB_ID gem set every time.

So, uh, like this: do not create empty gemsets for everty job run by evgeni · Pull Request #224 · theforeman/jenkins-jobs · GitHub

Another thing that I wanted to mention: Right now, getting the “numbers” is rather annoying, you need to look at the individual job runs and click a lot. I was thinking we should try something like Jenkins OpenTelementry plugin and let it report against (their free tier should be totally sufficient for us) or similar.

1 Like

I think the per build gemset exists because we cleanup the gemset at the end and we could face race conditions and clashes. Maybe we consider dropping that all together? I also wondered about installing to a central spot for caching on the node, e.g ‘bundle install --path ~/.rubygems’

Lets give telemetry and honeycomb a try.

I’ve configured opentelemetry, but reading the traces is rather cumbersome, as many steps are just “sh”…

I’ve started adding proper labels to them in

To get better data about Foreman PR tests, I have started work on re-writing Foreman PR tests to pipelines. I am taking a different approach than we have previously and would enjoy feedback from maintainers of CI as well as developers who work in the Foreman code base.

At this point, the following has occurred around Foreman PRs:

  • Introduced a new pipeline based dedicated unit test job
  • Introduced a new pipeline based dedicated integration test job
  • Katello PR job converted to a pipeline

An example PR running the old and the new:

LInks to those specific runs:

The old foreman job took 35 minutes to run.
The new unit test job took 20 minutes to run.
The new integration test job took 21 minutes to run.
The new Katello job took 22 minutes to run.

Splitting out the unit and integration test jobs takes more total time but because they are split into parallel jobs they are able to run ~15 minutes faster with results for contributors. Given this I would like to move forward dropping the old test jobs:

Not mentioned here (since it’s not a speedup), but this also fixes the long standing issue where Foreman’s stable branches were checking out Katello master, so it was completely pointless to test. Now it looks up the correct branch.