So many items to unpack here that I am going to attempt to break my
response into categorizations. If I miss a key point, please point it out
so I can follow up addressing it.
Data Analysis
First, let's start with some actual data from our test runs. Let's take a
sample from a recent test run:
PR Upgrade Test: 16 minutes
PR Develop Test: 57 minutes
PR Katello Test: 40 minutes
These are run in parallel so to speed things up, we have to address the
bottleneck which is the longest running job.
For "PR Develop Test", we run the following matrix with times:
Ruby 2.2 – postgresql: 48 minutes
Ruby 2.2 – mysql: 30 minutes
Ruby 2.2 – sqlite3: 26 minutes
Ruby 2.3 – postgresql: 56 minutes
Ruby 2.4 – postgresql: 57 minutes
These, in theory, all run in parallel as well and as we can see it's the
postgresql tests that take the longest. Part of what explains this is that
we run the integration tests only against postgresql databases. If we
breakdown this job:
Tests: 2697 seconds (45 minutes) at 2.44 tests/second for 6606 tests
Setup: 12 minutes
That's 12 minutes just to install gems, install npm modules, create and
migrate the database multiple times. But, the bottleneck is still the 45
minutes it takes to run unit plus integration tests. To put things in
perspective, if you could run each of the 6606 tests in 100 ms, it would
still take 12 minutes.
Removing Unit Tests and Their Value
Whenever I question how I am thinking about testing, I usually turn to
Uncle Bob (Robert Martin) [1] and Sandi Metz [2] for enlightenment. The two
links below are just a sampling, but given their assumed authority in the
field, and their ability to make good arguments I tend to fall in line with
their thinking. I think it's important to take away from Uncle Bob what a
unit test vs. integration test is and their different values. And from
Sandi, how to think about what a unit test should be testing.
To Lukas' point, there is value in removing unit tests. But not because
they have not broken in a while. We don't merge code unless all tests are
passing. So the argument to remove tests because they have not broken does
not hold as much water given a developer has to ensure all tests are
passing. Which, in a PR based world, is the entire point of these tests. To
ensure that a developers changes do not break assumptions within the code
base. That they do not cause regressions in expected behavior. The most
user facing guardian of this is our tests. However, we can be smarter about
our tests and that's why I would encourage watching Sandi's talk, finding
other talks and focusing on making our tests better. Let's remove tests not
because they don't break, but because they aren't providing any real value.
Let's also focus on where the bottlenecks lie. If we remove 20 unit tests
because they are old and don't break, but run in 100 ms each, we've only
shaved 2 seconds off of a 45 minute test run. Jenkins output for a given PR
gives a view of what the slowest test suites are. I'd encourage developers
to start there and tackle the bottlenecks [3].
Some Solutions
I'll throw out some of my own thoughts on solutions as well as echo some
previously iterated ones.
- Run Test Suite Subsets Based on Changes
This is Ohad's original idea and I think it's a good one when we have
different languages and different test sets within our codebase. He and I
have chatted about this before and I am attempting to prioritize it. The
gist of the breakdown being don't run Ruby related testing if only JS
changes and vice versa. As we move towards having more of a SPA style
application, breaking the UI bits into their own repository would allow
them to iterate faster due to a faster test suite. The same might could be
said of the Ruby bits if we can find logical separation (per Ohad's other
suggestion).
- Re-examine Parallel of Testing
Perhaps this is already being done to the best we can, but looking again at
can we run our test suites in a more parallel fashion to breakthrough the
45 minute bottleneck and try to break it into 3 15 minutes chunks for
example.
- "Static" Environments
This comes from both a trend in the Jenkins world and a question from Timo
to move towards essentially Docker containers with pre-populated
environments. The idea would be to ensure these are routinely updated with
baseline dependencies installed so that setup time is reduced by having
most assets (e.g. gems and npm modules) already on the container.
- Quick to Fail
This is the idea that we have the test suite bail out on the first failure,
freeing up Jenkins and giving feedback sooner that the PR has issues. I am
not a huge fan of this as it has some downsides for developers who rely on
Jenkins to tell them what issues their work has versus running tests
locally first.
- Thoroughly Investigate Slow Tests
I mentioned this in the previous section, but dedicate some developer time
to examining why some test suites run slow and if there are things we can
change. For example, finding and suggesting tests to remove that provide no
value, looking for less resource intensive ways to do some tests, or find
flaws in how tests were crafted to begin with. An example of that last one
is a test might be seeding the database with a lot of extraneous data that
causes it to run slow due to database operations.
- Reduce Support Matrix
Granted, in general, if there is not significant load on Jenkins, all of
our testing runs in parallel across Jenkins executors. However, this is
generally not true for our infrastructure. Further, developers have
expressed a desire to increase the amount of testing we do by adding
plugins into the matrix.
That being said, today we support 3 databases and 3 versions of Ruby. We
attempt to give users choice in production between postgres and mysql, and
provide developers the use of a lightweight database in sqlite. Further, we
support a single RPM based production setup and multiple Debian giving us a
range of Rubies. This is compounded by wanting to test against potential
upgrades in our Rails and Ruby stack.
With choice comes burden on infrastructure and testing. I'd ask that we
consider being more opinionated and reducing this matrix. For example, if
we centralize on Forklift based development environments we could drop
sqlite. I will say up front I am less knowledgeable about Debian, but the
packaging repository makes it appear we support 4 different versions.
Perhaps we consider, if such a thing exists, locking in on LTS type
versions or dropping support sooner to focus on a few hardened environments.
Eric
[1] https://www.youtube.com/watch?v=URSWYvyc42M
[2] http://blog.cleancoder.com/uncle-bob/2017/05/05/TestDefinitions.html
[3]
http://ci.theforeman.org/job/test_develop_pr_core/14972/database=postgresql,ruby=2.3,slave=fast/testReport/(root)/
···
On Tue, Nov 7, 2017 at 5:35 AM, Lukas Zapletal wrote:
We can define these tests as a seperate set that gets only a weekly run,
but deletion is IMHO the completetly wrong way.
https://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf
A cleanup is reasonable, not completely wrong way. I bet that week
after we move this into 2nd tier it starts breaking and nobody will
care, because code would had been merged already. That’s the point
here.
I think both are needed. Integration tests are much slower than unit
tests, in general.
And by the way our unit tests are not fast, really. They are slow as
hell, mainly because most of them are not really unit tests. I am
speaking to you, model tests. Let me give you an example:
test/models/architecture_test.rb
Not a unit test, not an integration test, not a system test, haven’t
failed for a very long time, useless if we have the most important
stuff covered in an integration or system test, which we do.
–
Lukas @lzap Zapletal
–
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
–
Eric D. Helms
Red Hat Engineering