How do we do UI tests, and should we continue that way?

Hi everyone,

Since we don’t really have documentation on how to write UI tests, what’s expected, and reference guides to it, I would like to write something up for us, to make it easier to contribute.

For that I’m making sure we have an agreement about tests, and would like to hear if anyone has any thoughts about our ui tests.

2 Issues we have around ui testing are:

  1. We have 500~ snapshot files that we will have to somehow remove at some point if we want to update to React 17+ (snapshot tests/Enzyme can’t run on react 17+)

  2. While doing reviews and writing new components we didn’t insist on tests, and so some areas are not tested at all.

The way I see we need tests is:

  1. Unit tests using React Testing Library React Testing Library | Testing Library - this proves difficult to write sometimes, for example:

  2. Integration tests using Capybara File: README — Documentation for teamcapybara/capybara (master)
    I think here we wrote even less tests for the ui in the last few years at least, which is why I can’t point out pros or cons, but will say that the tests we do have, had helped me while making prs.

So discussion questions are:

  1. Do you have tips on writing tests in RTL or capybara?

  2. What do you think we do right? wrong?

  3. Any suggestions on how to replace the snapshot tests? (Removing them without a replacement is not a good option in my opinion as they still provide feedback on if a component is even rendered and if somethings become undefined)

  4. Should we start having guidelines for new PRs on which tests they should have? (And enforce them more strictly)

I don’t think we are going to replace capybara or RTL as that will cause us to have another testing library to document and maintain, and we don’t have that capacity.

Thanks :slight_smile:

PS: were working on making plugins ui tests run from Foreman core Fixes #37636 - remove usage of "@theforeman/test" by MariaAga · Pull Request #10239 · theforeman/foreman · GitHub

1 Like

ping @dosas & @thorbend

The tests we write with React Testing Library are not unit tests. Rather, they are closer to integration tests but really somewhere in between. From Guiding Principles | Testing Library - “The more your tests resemble the way your software is used, the more confidence they can give you.” I strongly agree with this philosophy.

An example of a true unit test would be testing a single JavaScript function, feeding it inputs, and asserting the expected outputs. We don’t really have a lot of those in our JS code.

As for snapshots, those are closer to unit tests (since React components are functions nowadays anyway, we’re just asserting the output of the function) but I don’t see much use in them because they’re too automated. When snapshots fail, it’s too easy to update them and just say “this is fine.” You don’t have to go through the mental exercise of “how should this output be changing, given the change in my code?” I mean, you should do that, and I’m sure you, Maria, do that :wink: but most people don’t even know that you’re supposed to do that with snapshots.

What are the solutions? I’m not sure, I just know I like RTL and want to keep it, and I strongly dislike snapshots.

1 Like

Another issue in cabybara tests I forgot is, I cant run them in my more complicated env (that has all the plugins). I cant find the error right now, but I usually have to run integration test on a simple foreman with no plugins.

Was less catchy than unit tests :smiley:

Thanks for the feedback its also important to hear that we want to keep it!

1 Like

I’m not a fan of mocks because they give a false sense of safety. You can easily spend more time maintaining a proper mock than the actual code. Sure, for very specific unit tests it can be useful but I tend to lean more to integration tests.

Technically Rails started to move browser tests into system testing, though Foreman’s code base is old enough to predate that. That’s why we still have integration tests that fire up a browser. Though when I look at integration testing it also does some basic client-like things so I’m a bit unclear where the difference really is.

Overall I like these kinds of tests because they stress the code base as a whole. There are plenty of images that clearly show why you need some integration testing. Stuff like:

For Rails I’d recommend Testing Rails Applications — Ruby on Rails Guides. I see there are also screenshot helpers and I can see those being useful. If there’s demand for that then we should look at archiving them in our CI. We should limit ourselves to the failed cases.

While it may be more work to write, I think the behavior testing that RTL provides is better than snapshot testing. In my experience tests are also a way of codifying expected behavior and when you don’t touch a piece of code for a long time (as is common in long running projects such as Foreman) then it’s great to look back at. Especially when you refactor things.

I think it’s hard to have good guidelines on testing because in pracitce it’s always a bit of a feeling instead of an exact science. I don’t expect us to implement formal verification of our code because especially for UI it’s extremely difficult. Still, you can have a look at QuickCheck in Every Language - Hypothesis. What I took away from those formal methods was at least a mindset of writing code that is testable. This is really good for unit tests.

After I wrote this @jeremylenz replied so I’m not going to rewrite the above, but:

My reply very closely aligns with this.

I haven’t worked with RTL but :+1: on disliking snapshots.

2 Likes

It’s been a while since I contributed to the JS with tests,
but writing JS tests was always super confusing (and annoying) for me; seeing tests like this one didn’t help much :smiley:

And Capybara tests: If I was lucky enough to get the environment working, it was easy to write them. However, problems with the browser configuration occurred very often, so I tried to avoid them.

I want to ask one thing:
When we say integration tests, what exactly do we mean?
For me, it’s running tests in the browser with the whole app, Rails + compiled React.
Does everyone have the same understanding? Maybe we should first talk about our expectations from tests and whatnot.

QE note: I would add QEs to this discussion; they have many browser tests already. We may have some duplicate tests in our codebases, and we could join the efforts.

Yes please.

Rails 5.1 started to refer to this as system testing and I like that. Integration tests could also be testing integration of multiple components without doing a full stack test.

Foreman’s testing stack predates this which is why we still call them integration tests. Perhaps as part of standardizing this we should migrate our tests over?

Yes, let’s unify the naming and, if possible, migrate full-stack integration tests to system tests.

Additionally, I don’t see many comments here from our QEs. For example, one thing we could discuss is: why do we have system tests in Foreman, and also similar tests in Robotello?

I, for example, would say that we, as developers, should focus on unit and integration tests only, leaving the full-stack system tests (or as some call them, E2E tests) to QA engineers and their framework. Whether it’s a good idea or not is a matter for debate, but I’d like to hear opinions from all sides.

I think we want as much feedback as possible while reviewing a PR. Ideally when CI is green you can be reasonably sure it works. How we get there is an implementation detail.

The role of QE engineers is really something that is Red Hat specific and shouldn’t influence upstream. In upstream we only have the person submitting a PR and reviewers. One logical conclusion can be that they’re reviewers, which is ok.

It does touch on another topic: where do the tests live? If they’re in another repository it will be hard to coordinate. In other words, it will be easier if the tests live in the same repository as the code so you can atomically make a change: update both the code and the tests in the same PR.

2 Likes
  • +1 for not liking snapshots and preferring RTL.
  • In my opinion, RTL is quite clear and well documented, so it’s not difficult to understand it, however, it has its issues.
  • The mocks are complicated, but @MariaAga, do I understand correctly that we will eventually get rid of those?
  • +1 for some kind of guidelines/tutorial/tips document. I’m personally confused about when to use RTL or Capybara tests. After reading the conversation under this post, I think I also don’t get the exact difference between unit and integration testing. :smiley:

A good unit test only tests a very small piece of code. Ideally an individual function. Many poorly designed functions have lots of interactions with other code and can’t be tested independently. That’s where people come up with mocks and stubs. Often it’s better to rewrite code so it’s testable without mocks by decoupling.

The problem with mocks and stubs is that you can spend a lot of time writing them but what are you really testing? Your code or your mocks/stubs?

A level higher you have integration testing where you test the interaction between your functions. Here mocks and stubs can make sense. For example, to explicitly inject failures. Recorded API responses work great here and can be seen as mocks. You’re still testing an isolated set of functionality, but you’re starting to get a feel for the larger system.

But in the end, nothing beats testing the real thing end to end as having the real thing. That’s what rails calls system tests.

For Foreman I’d say that RTL falls under integration tests and I’d expect it’s common to stub out the Foreman API responses. If you really want Foreman to return API responses, you need Foreman running. At that point it makes more sense to use Capybara and make it system tests.

One thing to keep in mind is that you can test the same code at all different levels. A unit test will run very fast and is excellent at giving feedback when it fails, but there are many things a unit test can’t test. An integration test can test more variations, but will be slower. System tests take that up a notch and are able to really stress your code while also being a lot slower.

Real life is often messier and nothing is pure X or Y, but this is how I see it from a very high level.