How do we do UI tests, and should we continue that way?

MariaAga · April 16, 2025, 7:35am

Hi everyone,

Since we don’t really have documentation on how to write UI tests, what’s expected, and reference guides to it, I would like to write something up for us, to make it easier to contribute.

For that I’m making sure we have an agreement about tests, and would like to hear if anyone has any thoughts about our ui tests.

2 Issues we have around ui testing are:

We have 500~ snapshot files that we will have to somehow remove at some point if we want to update to React 17+ (snapshot tests/Enzyme can’t run on react 17+)
While doing reviews and writing new components we didn’t insist on tests, and so some areas are not tested at all.

The way I see we need tests is:

Unit tests using React Testing Library React Testing Library | Testing Library - this proves difficult to write sometimes, for example:
- Typing in our template editor is difficult/not possible in some cases: trigger react-ace onChange via React Testing Library (RTL) · Issue #923 · securingsincity/react-ace · GitHub
- Having to guess where to put advanceTimersByTime to handle popouts closure AdvancedFields.test.js example or Graphql HostsAndInputs.test.js example
- I saw a few different way to do API mocks, not sure if its a bad thing but it probably makes it harder for new people to re-use (mocking the store, mocking the selectors, mocking the function that calls axios)
Integration tests using Capybara File: README — Documentation for teamcapybara/capybara (master)
I think here we wrote even less tests for the ui in the last few years at least, which is why I can’t point out pros or cons, but will say that the tests we do have, had helped me while making prs.
- Lately I noticed more randomly failing tests, that are probably just not “waiting” enough, example: Fixes #38321 - taxonomy rb tests randomly failing by MariaAga · Pull Request #10498 · theforeman/foreman · GitHub

So discussion questions are:

Do you have tips on writing tests in RTL or capybara?
What do you think we do right? wrong?
Any suggestions on how to replace the snapshot tests? (Removing them without a replacement is not a good option in my opinion as they still provide feedback on if a component is even rendered and if somethings become undefined)
Should we start having guidelines for new PRs on which tests they should have? (And enforce them more strictly)

I don’t think we are going to replace capybara or RTL as that will cause us to have another testing library to document and maintain, and we don’t have that capacity.

Thanks

PS: were working on making plugins ui tests run from Foreman core Fixes #37636 - remove usage of "@theforeman/test" by MariaAga · Pull Request #10239 · theforeman/foreman · GitHub

maximilian · April 16, 2025, 8:26am

ping @dosas & @thorbend

jeremylenz · April 16, 2025, 12:58pm

The tests we write with React Testing Library are not unit tests. Rather, they are closer to integration tests but really somewhere in between. From Guiding Principles | Testing Library - “The more your tests resemble the way your software is used, the more confidence they can give you.” I strongly agree with this philosophy.

An example of a true unit test would be testing a single JavaScript function, feeding it inputs, and asserting the expected outputs. We don’t really have a lot of those in our JS code.

As for snapshots, those are closer to unit tests (since React components are functions nowadays anyway, we’re just asserting the output of the function) but I don’t see much use in them because they’re too automated. When snapshots fail, it’s too easy to update them and just say “this is fine.” You don’t have to go through the mental exercise of “how should this output be changing, given the change in my code?” I mean, you should do that, and I’m sure you, Maria, do that but most people don’t even know that you’re supposed to do that with snapshots.

What are the solutions? I’m not sure, I just know I like RTL and want to keep it, and I strongly dislike snapshots.

MariaAga · April 16, 2025, 1:09pm

Another issue in cabybara tests I forgot is, I cant run them in my more complicated env (that has all the plugins). I cant find the error right now, but I usually have to run integration test on a simple foreman with no plugins.

MariaAga · April 16, 2025, 1:10pm

Was less catchy than unit tests

Thanks for the feedback its also important to hear that we want to keep it!

ekohl · April 16, 2025, 1:34pm

I’m not a fan of mocks because they give a false sense of safety. You can easily spend more time maintaining a proper mock than the actual code. Sure, for very specific unit tests it can be useful but I tend to lean more to integration tests.

Technically Rails started to move browser tests into system testing, though Foreman’s code base is old enough to predate that. That’s why we still have integration tests that fire up a browser. Though when I look at integration testing it also does some basic client-like things so I’m a bit unclear where the difference really is.

Overall I like these kinds of tests because they stress the code base as a whole. There are plenty of images that clearly show why you need some integration testing. Stuff like:

For Rails I’d recommend Testing Rails Applications — Ruby on Rails Guides. I see there are also screenshot helpers and I can see those being useful. If there’s demand for that then we should look at archiving them in our CI. We should limit ourselves to the failed cases.

While it may be more work to write, I think the behavior testing that RTL provides is better than snapshot testing. In my experience tests are also a way of codifying expected behavior and when you don’t touch a piece of code for a long time (as is common in long running projects such as Foreman) then it’s great to look back at. Especially when you refactor things.

I think it’s hard to have good guidelines on testing because in pracitce it’s always a bit of a feeling instead of an exact science. I don’t expect us to implement formal verification of our code because especially for UI it’s extremely difficult. Still, you can have a look at QuickCheck in Every Language - Hypothesis. What I took away from those formal methods was at least a mindset of writing code that is testable. This is really good for unit tests.

After I wrote this @jeremylenz replied so I’m not going to rewrite the above, but:

My reply very closely aligns with this.

I haven’t worked with RTL but on disliking snapshots.

lstejska · April 17, 2025, 1:12pm

It’s been a while since I contributed to the JS with tests,
but writing JS tests was always super confusing (and annoying) for me; seeing tests like this one didn’t help much

And Capybara tests: If I was lucky enough to get the environment working, it was easy to write them. However, problems with the browser configuration occurred very often, so I tried to avoid them.

I want to ask one thing:
When we say integration tests, what exactly do we mean?
For me, it’s running tests in the browser with the whole app, Rails + compiled React.
Does everyone have the same understanding? Maybe we should first talk about our expectations from tests and whatnot.

QE note: I would add QEs to this discussion; they have many browser tests already. We may have some duplicate tests in our codebases, and we could join the efforts.

Yes please.

ekohl · April 22, 2025, 4:24pm

Rails 5.1 started to refer to this as system testing and I like that. Integration tests could also be testing integration of multiple components without doing a full stack test.

Foreman’s testing stack predates this which is why we still call them integration tests. Perhaps as part of standardizing this we should migrate our tests over?

lstejska · April 24, 2025, 12:24pm

Yes, let’s unify the naming and, if possible, migrate full-stack integration tests to system tests.

Additionally, I don’t see many comments here from our QEs. For example, one thing we could discuss is: why do we have system tests in Foreman, and also similar tests in Robotello?

I, for example, would say that we, as developers, should focus on unit and integration tests only, leaving the full-stack system tests (or as some call them, E2E tests) to QA engineers and their framework. Whether it’s a good idea or not is a matter for debate, but I’d like to hear opinions from all sides.

ekohl · April 24, 2025, 12:59pm

I think we want as much feedback as possible while reviewing a PR. Ideally when CI is green you can be reasonably sure it works. How we get there is an implementation detail.

The role of QE engineers is really something that is Red Hat specific and shouldn’t influence upstream. In upstream we only have the person submitting a PR and reviewers. One logical conclusion can be that they’re reviewers, which is ok.

It does touch on another topic: where do the tests live? If they’re in another repository it will be hard to coordinate. In other words, it will be easier if the tests live in the same repository as the code so you can atomically make a change: update both the code and the tests in the same PR.

kmalyjur · April 28, 2025, 12:22pm

+1 for not liking snapshots and preferring RTL.
In my opinion, RTL is quite clear and well documented, so it’s not difficult to understand it, however, it has its issues.
The mocks are complicated, but @MariaAga, do I understand correctly that we will eventually get rid of those?
+1 for some kind of guidelines/tutorial/tips document. I’m personally confused about when to use RTL or Capybara tests. After reading the conversation under this post, I think I also don’t get the exact difference between unit and integration testing.

ekohl · April 28, 2025, 2:04pm

A good unit test only tests a very small piece of code. Ideally an individual function. Many poorly designed functions have lots of interactions with other code and can’t be tested independently. That’s where people come up with mocks and stubs. Often it’s better to rewrite code so it’s testable without mocks by decoupling.

The problem with mocks and stubs is that you can spend a lot of time writing them but what are you really testing? Your code or your mocks/stubs?

A level higher you have integration testing where you test the interaction between your functions. Here mocks and stubs can make sense. For example, to explicitly inject failures. Recorded API responses work great here and can be seen as mocks. You’re still testing an isolated set of functionality, but you’re starting to get a feel for the larger system.

But in the end, nothing beats testing the real thing end to end as having the real thing. That’s what rails calls system tests.

For Foreman I’d say that RTL falls under integration tests and I’d expect it’s common to stub out the Foreman API responses. If you really want Foreman to return API responses, you need Foreman running. At that point it makes more sense to use Capybara and make it system tests.

One thing to keep in mind is that you can test the same code at all different levels. A unit test will run very fast and is excellent at giving feedback when it fails, but there are many things a unit test can’t test. An integration test can test more variations, but will be slower. System tests take that up a notch and are able to really stress your code while also being a lot slower.

Real life is often messier and nothing is pure X or Y, but this is how I see it from a very high level.

Marek_Hulan · April 29, 2025, 3:33pm

Where should the test live if it’s E2E covering the functionality that lives in 2 or more repos?

ekohl · April 29, 2025, 4:02pm

That’s a good question, but maybe we’re getting out of scope now. Right now we’re investigating tmt as an option. It can pull in metadata from other repositories so you can compose the tests. For example, in smart_proxy_remote_execution_ssh you would pull in the tests from foreman_remote_execution as a regression test.

pondrejk · May 12, 2025, 9:49am

Hi, Satellite QE here, first some informational notes on UI testing situation downstream:

robottelo has around 14% of tests dedicated to UI scenarios, those fall into the category of system tests (user pov, blackbox, run via browser, often longer e2e scenarios), sometimes system integration tests (Satellite talking to other systems)
overall the strategy is to have some basic UI scenarios for the main procedures + stuff that can’t be done in CLI/API, there’s no aim for full coverage
robottelo uses airgun (GitHub - SatelliteQE/airgun: AirGun is a Python library that is built over Widgetastic and navmazing to make Satellite 6 UI testing easier.) to model how the UI elements are identified in the browser
these tests require a disproportional share of maintainence, often because upstream ui changes needing to be worked in (changed ids and locators, different nesting structure, patternfly updates)
they are prone to hickups (selenium timeouts, mysterious stuff), so the results can be trickier to analyze, otoh tere is quite some infrastructure around to ease this (e.g. stored screencasts)

What it means for upstream testing?
I don’t know

I suppose if we want to do similar kind of testing upstream, we should first explore the possibility to reuse downstream tests for this before writing something new.
As for PR verification, I suppose it would be trickier to use robottelo there (the test matching the PR’s changes might not exist, the test could also fail due to reasons unrelated to the PR’s changes), but nothing (technically) prevents QE to run test against the patched system upon request.

Also I’d would be great for QEs to somehow get notified of UI changes potentially breaking airgun beforehand, not sure how that could work though.

Hope this helps

ekohl · May 12, 2025, 10:05am

Fundamentally one challenge of airgun is that it’s written in Python. This means a developer may need to write something in 3 languages: Javascript, Ruby and Python.

I’d love to have full feature parity between the UI and API so you can always test the system.

It’s also in another git repository so you need to coordinate merges across repos. So to effectively test a change you may need to somehow tell the test infrastructure to combine various PRs at the same time, which is tricky.

That also makes cherry picks harder.