CI Testing Strategies: Parallel vs. Fail Fast

ehelms · February 7, 2018, 1:06pm

There has been some effort to slow start re-writing some of our CI jobs using newer technologies within Jenkins. One of the features of Jenkins 2 pipelines is the ability to execute discrete actions in parallel with ease. For example, being able to run unit tests, rubocop and UI tests in parallel. This leads to a question of strategy.

From my perspective there are two strategies: fail fast or comprehensive.

The idea behind fail fast is that we avoid parallelism largely and instead focus on sequential steps that start with the fastest to run stage and leave the longest to the end. For example, running rubocop at the start and ruby test suite at the end. This has the benefits of giving feedback to a developer as quickly as possible and freeing up Jenkins resources. The downside is that a developer can only see what failed for a piece of the testing pipeline. This could increase PR churn but possibly encourage developers to run all testing locally first.

The idea behind comprehensive is to run as much in parallel as possible to increase execution speed while giving an overview of every aspect of testing/linting/etc. that failed as part of the PR. This gives developers an overview of all failures and allows them to fix all of them in one go. This has the partial downside of putting more tension on Jenkins (longer running jobs) and encouraging developers to use PRs as a substitute for running testing locally.

As this influences pipeline design heavily, I’d appreciate feedback from long time developers and new developers with an eye towards efficiency, best use of resources and barrier to entry.

Eric

sean797 · February 7, 2018, 4:05pm

I have never run all tests locally, it takes too long on my laptop. I tend to run the tests I think may be broken or written, for example the one relating to the controller or model I have changed. Jenkins then provides me that extra feedback, now I know what other tests I should fix or run locally.

I guess its obvious that parallel provides the best experience to developers. So I would say, do we think we have capacity for fully parallel tests? If not would we if we if we made other things more efficient. e.g don’t run rake tests on changes to the /script directory in Foreman.

lzap · February 7, 2018, 4:33pm

Running unit tests or rubocop is super easy and we should not abuse overloaded Jenkins to do this. The reality is different tho, our tests are so slow that many of us tend to file PRs without running all the tests locally.

So comprehensive is what I want. And fail fast is perhaps what we can afford, unless the new setup is faster enough.

Here is an idea - if it’s possible to run unit, (functional), ui and rubocop in parallel, can we create some tooling or scripts around this so people can run it locally as well (without jenkins)? That could improve amount of local test runs in the wild. I assume that each testing process will have its own rake task (called by jenkins), so we would only need one extra rake task orchestrating everything for local users.

ehelms · February 7, 2018, 4:35pm

This is another area I mean to write about as part of this that I think applies in either case that I know @ohadlevy has requested from us for a few months (working on it!). Being smarter about our tests and only running suites that impact the code areas changed. This could have an impact on this discussion, but I do think it broad enough to have as a directional goal of our CI to go in.

ehelms · February 7, 2018, 4:42pm

This is a general CI philosophy I would like any jobs we do to move towards and try to re-factor towards it if not immediately able to do so. That is to say, keeping as much of the code that runs as tooling that is designed to run anywhere and is simply orchestrated by Jenkins. So yes, in either case I think we can help in this area.

Marek_Hulan · February 7, 2018, 11:27pm

I’d prefer running jobs in parallel and let it finish so I can see all errors I need to fix. Currently the job is killed if new version is pushed, if we keep that I hope it should not cause higher load. Running tests locally is fine but running them on all 3 DBs is hard. Also enabling/disabling plugins locally is something I’d like to avoid. If we start testing core with plugins, these should imho also be parallel runs.

ekohl · February 8, 2018, 12:07am

I’d suggest a hybrid: run rubocop and unit tests in parallel (step 1) and if that passes run the other jobs in parallel (step 2).