Github app for detecting "flaky" build sources


#1

A github engineer I know developed a tool that detects “flaky” builds.

A “flaky” build is a build that fails intermittently with no detectable changes to the source code itself.

The tool is an alpha state, according to the engineer, but I’ve asked for access to give feedback on what is useful to teams using the tool.

I think the detection of flaky builds isn’t the big problem, but identifying specific sources of that flakiness might be. For example, are there tests that fail now and then? Do we know which ones are suspect?

The tool is at https://github.com/apps/buildpulse, but the tool is limited by invite only at this time. Let me know you thoughts.


#2

I think if it’s not a ton of overhead to try it out - we should do so. Would love to see an example of the kind of output it generates! Thanks for sharing


#3

We rely on Jenkins for most tests. It does post to github statuses so they may be sufficient, but it’d be interesting to see how it works out.

I don’t mind installing it and trying it out. Sending an email for an invite.


#4

+1 to fixing our flaky tests, they are really a daily headache and we don’t have full confidence in our test suite currently.

That tool looks promising, I am curious as to what it reports and how it works. If it works with our CI system I think we should definitely try it out.

@jjeffers What are the next steps?


#5

@John_Mitsch After speaking with the tool’s author, I think the next step is to provide feedback to him about what use we are getting from it.

I thought that while we already know that we do have flaky builds we really want to be able to identify the causes. A report or alert that indicates what caused the problem would be great. I don’t yet know anything about how much the service will cost. I believe the tool is still under active “alpha” development. So we probably have a good chance to get a lot of voice in it’s features.


#6

@jjeffers Is the tool enabled in Katello’s repo already?


#7

It is ‘installed’, but i’m not sure exactly how to use it. When i try to login to their website it says i don’t have an invite. Nor do i see much poking around on github. Do we need individual invites for each person that wants to access it?


#8

@Justin_Sherrill I will find out re access.


#9

@Justin_Sherrill @jasonrudolph is travelling at the moment, but he said if you send me github usernames he can add those people to the invite list, and then they should be able to access the build pulse output.


#10

:wave: Hi folks! I’m the developer behind BuildPulse, and I’m excited to have y’all trying it out.

jjeffers: Thanks so much for kicking off this discussion! :bowing_man:‍♂

If it works with our CI system I think we should definitely try it out.

@John_Mitsch: Thanks for being willing to try it out! Good news: As of last week, BuildPulse is actively monitoring Katello/katello’s builds. :sweat_smile:

Do we need individual invites for each person that wants to access it?

@Justin_Sherrill: That’s right. During this early alpha phase, each person needs an invite. I’ve added you, @John_Mitsch, ekohl, and jjeffers to the early access list, so you should be able to sign in and see Katello/kattelo’s flake monitoring at buildpulse.io. If you have any trouble signing in, please let me know.

After speaking with the tool’s author, I think the next step is to provide feedback to him about what use we are getting from it. … I believe the tool is still under active “alpha” development. So we probably have a good chance to get a lot of voice in it’s features.

jjeffers: Indeed! The current app description offers a glimpse of what I want BuildPulse to ultimately provide, but I’m challenging myself to share this super early incarnation of the app with a few real-world dev teams now. I’m excited to have you all try it out to see where BuildPulse can help you all and what would make it more useful to teams that are fighting with flakes.

Please send me any feedback, questions, or feature requests that come to mind. I read and respond to every message.


#11

Due to a team meetup I couldn’t share my experiences earlier.

My biggest feedback is that it triggers on our PR processor, but I think that’s more of a linting thing that the author didn’t properly specify the Redmine issue or something else.

Now there are 2 ways to interpret this. The first is that it’s a user error and they should do a better job. The other can be that we don’t do a good job of communicating that part.

Right now I don’t know who should improve. A quick glance tells me that it’s 91% stable so maybe that’s good enough? Is there even something to improve?


#12

@ekohl: Thanks for sharing that experience. :bow:

I’ve attempted to summarize the behavior you’re seeing below, but I’ll offer a quick tl;dr for starters: I think this experience reveals the need for a way to tell BuildPulse that certain CI checks (like prprocessor) don’t perform any analysis on the repository’s code, and therefore those CI checks should be excluded when monitoring for flaky builds.

———

If I’m understanding correctly, the prprocessor check analyzes the commit message and the pull request description, but it doesn’t analyze the code in any way. Is that right?

If that’s the case, then I imagine that the following series of events is taking place:

  1. Someone opens a PR
  2. prprocessor sees that the PR description or commit messages violate the guidelines, and prprocessor reports a failing CI status
  3. Without changing the code in any way, the PR author fixes the violation in the PR description or commit message
  4. prprocessor reports a passing CI status
  5. BuildPulse notices that it has seen both a failing CI status and a passing CI status for the same code, and BuildPulse flags this as a flaky build

In the case of prprocessor, the check doesn’t involve the underlying code, so it seems like we’d want to exclude it from the flake analysis.

I can imagine some other CI checks that might behave similarly. For example, some teams use a CI check to verify that the PR author has signed the project’s contributor license agreement. If you haven’t signed it when you open the PR, that CI check will fail. Once you sign it, the CI check will pass, even though the underlying code hasn’t changed.

I’m opening an issue to explore options for addressing this need within BuildPulse.

Thanks again for sharing your observationsl


#13

That’s a very good analysis of the problem and sounds like a valid solution.


#14

:wave: @ekohl: Following up on this discussion, BuildPulse now supports the ability to ignore CI specific types of CI checks. BuildPulse will now ignore the prprocessor checks when monitoring for flaky builds in @theforeman’s repositories and @Katello’s repositories.

If you have any additional CI checks that should be ignored, just let me know, and I’ll happily add them for you. In the short term, I’ll handle this kind of configuration manually via support requests. Longer term, it may very well be something that teams are able to configure on their own.

Thanks again for sharing the feedback that revealed the need for this improvement! :bow:


#15

Thanks for that! Some more feedback, this time about the UI :slight_smile:

When I open the app, I see 2 organizations: Katello and theforeman. However, when I click through Katello is empty and in theforeman I see one repository. The pages feel very empty and perhaps this can be slightly optimized by merging it into one (low priority).

The Foreman repository now has 100% stable builds (see below) and that means I have no controls that do anything.

Perhaps the organization / repository part could be links?

I also wondered about other dates so perhaps it needs a date picker if that data is available?

I’ll also add some repositories so there’s a bit more data.

Longer term I wonder if it’s possible to dive into test results. I know this is hard because you want to integrate with anything and there’s no way to upload test results into Github, but it’d be great to automatically identify specific flaky tests.


#16

Thanks, @ekohl! This is helpful! :bow:

When I open the app, I see 2 organizations: Katello and theforeman. However, when I click through Katello is empty and in theforeman I see one repository.

Thanks for pointing this out! Currently, BuildPulse is installed on theforeman organization with access to one repository (theforeman/foreman) and on the Katello organization with access to one repository (Katello/katello). With that in mind, it makes sense to me that you see one repository under theforeman organization, but I would have expected you to also see one repository under the Katello organization.

BuildPulse (and all GitHub apps) use this API endpoint to determine which repositories a user can access. The endpoint returns “repositories that the authenticated user has explicit permission (:read, :write, or :admin) to access for an installation.” When fetching the list of accessible repositories for your user account for the Katello installation, the API returns an empty array. :thinking: From what I see so far, I think your account might be lacking explicit permission to Katello/katello. You at least have implicit read access to it, because it’s a public repository. To help figure out whether I’m interpreting this situation correctly, are you able to see what permissions you have for Katello/katello on GitHub?

The pages feel very empty and perhaps this can be slightly optimized by merging it into one (low priority).

You’re right: These pages are so barebones right now. :see_no_evil: Providing a richer list view is definitely something I want to do.

The Foreman repository now has 100% stable builds (see below) and that means I have no controls that do anything. Perhaps the organization / repository part could be links? I also wondered about other dates so perhaps it needs a date picker if that data is available?

Congratulations on 100% stable builds! :trophy:

I agree that it would help for the page to provide more interactivity, and selecting various date ranges seems really useful to me.

I’ll also add some repositories so there’s a bit more data.

Cool! So far, I’ve focused on the experience for a repository that already has at least a week’s worth of flaky build analysis, and I haven’t yet focused on the “blank slate” experience. When you add more repositories, BuildPulse will need to monitor commit statuses for a few days before you’ll start seeing a useful UI. So at first, you’ll see a quite underwhelming blank slate, but after about a week, you should have a richer UI like you currently see for theforeman/foreman.

Longer term I wonder if it’s possible to dive into test results. … it’d be great to automatically identify specific flaky tests.

I agree wholeheartedly! I think that would make a big impact for teams that are battling flaky tests.