Github app for detecting "flaky" build sources

jjeffers · May 21, 2019, 2:38pm

A github engineer I know developed a tool that detects “flaky” builds.

A “flaky” build is a build that fails intermittently with no detectable changes to the source code itself.

The tool is an alpha state, according to the engineer, but I’ve asked for access to give feedback on what is useful to teams using the tool.

I think the detection of flaky builds isn’t the big problem, but identifying specific sources of that flakiness might be. For example, are there tests that fail now and then? Do we know which ones are suspect?

The tool is at https://github.com/apps/buildpulse, but the tool is limited by invite only at this time. Let me know you thoughts.

Jonathon_Turel · May 21, 2019, 2:45pm

I think if it’s not a ton of overhead to try it out - we should do so. Would love to see an example of the kind of output it generates! Thanks for sharing

ekohl · May 21, 2019, 2:54pm

We rely on Jenkins for most tests. It does post to github statuses so they may be sufficient, but it’d be interesting to see how it works out.

I don’t mind installing it and trying it out. Sending an email for an invite.

John_Mitsch · May 29, 2019, 1:45pm

+1 to fixing our flaky tests, they are really a daily headache and we don’t have full confidence in our test suite currently.

That tool looks promising, I am curious as to what it reports and how it works. If it works with our CI system I think we should definitely try it out.

@jjeffers What are the next steps?

jjeffers · May 29, 2019, 3:07pm

@John_Mitsch After speaking with the tool’s author, I think the next step is to provide feedback to him about what use we are getting from it.

I thought that while we already know that we do have flaky builds we really want to be able to identify the causes. A report or alert that indicates what caused the problem would be great. I don’t yet know anything about how much the service will cost. I believe the tool is still under active “alpha” development. So we probably have a good chance to get a lot of voice in it’s features.

John_Mitsch · May 29, 2019, 3:24pm

@jjeffers Is the tool enabled in Katello’s repo already?

Justin_Sherrill · May 29, 2019, 3:30pm

It is ‘installed’, but i’m not sure exactly how to use it. When i try to login to their website it says i don’t have an invite. Nor do i see much poking around on github. Do we need individual invites for each person that wants to access it?

jjeffers · May 29, 2019, 3:35pm

@Justin_Sherrill I will find out re access.

jjeffers · May 29, 2019, 5:59pm

@Justin_Sherrill @jasonrudolph is travelling at the moment, but he said if you send me github usernames he can add those people to the invite list, and then they should be able to access the build pulse output.

jasonrudolph · May 30, 2019, 3:09pm

Hi folks! I’m the developer behind BuildPulse, and I’m excited to have y’all trying it out.

jjeffers: Thanks so much for kicking off this discussion!

If it works with our CI system I think we should definitely try it out.

@John_Mitsch: Thanks for being willing to try it out! Good news: As of last week, BuildPulse is actively monitoring Katello/katello’s builds.

Do we need individual invites for each person that wants to access it?

@Justin_Sherrill: That’s right. During this early alpha phase, each person needs an invite. I’ve added you, @John_Mitsch, ekohl, and jjeffers to the early access list, so you should be able to sign in and see Katello/kattelo’s flake monitoring at buildpulse.io. If you have any trouble signing in, please let me know.

After speaking with the tool’s author, I think the next step is to provide feedback to him about what use we are getting from it. … I believe the tool is still under active “alpha” development. So we probably have a good chance to get a lot of voice in it’s features.

jjeffers: Indeed! The current app description offers a glimpse of what I want BuildPulse to ultimately provide, but I’m challenging myself to share this super early incarnation of the app with a few real-world dev teams now. I’m excited to have you all try it out to see where BuildPulse can help you all and what would make it more useful to teams that are fighting with flakes.

Please send me any feedback, questions, or feature requests that come to mind. I read and respond to every message.

ekohl · May 30, 2019, 4:01pm

Due to a team meetup I couldn’t share my experiences earlier.

My biggest feedback is that it triggers on our PR processor, but I think that’s more of a linting thing that the author didn’t properly specify the Redmine issue or something else.

Now there are 2 ways to interpret this. The first is that it’s a user error and they should do a better job. The other can be that we don’t do a good job of communicating that part.

Right now I don’t know who should improve. A quick glance tells me that it’s 91% stable so maybe that’s good enough? Is there even something to improve?

jasonrudolph · May 31, 2019, 4:21pm

@ekohl: Thanks for sharing that experience.

I’ve attempted to summarize the behavior you’re seeing below, but I’ll offer a quick tl;dr for starters: I think this experience reveals the need for a way to tell BuildPulse that certain CI checks (like prprocessor) don’t perform any analysis on the repository’s code, and therefore those CI checks should be excluded when monitoring for flaky builds.

———

If I’m understanding correctly, the prprocessor check analyzes the commit message and the pull request description, but it doesn’t analyze the code in any way. Is that right?

If that’s the case, then I imagine that the following series of events is taking place:

Someone opens a PR
prprocessor sees that the PR description or commit messages violate the guidelines, and prprocessor reports a failing CI status
Without changing the code in any way, the PR author fixes the violation in the PR description or commit message
prprocessor reports a passing CI status
BuildPulse notices that it has seen both a failing CI status and a passing CI status for the same code, and BuildPulse flags this as a flaky build

In the case of prprocessor, the check doesn’t involve the underlying code, so it seems like we’d want to exclude it from the flake analysis.

I can imagine some other CI checks that might behave similarly. For example, some teams use a CI check to verify that the PR author has signed the project’s contributor license agreement. If you haven’t signed it when you open the PR, that CI check will fail. Once you sign it, the CI check will pass, even though the underlying code hasn’t changed.

I’m opening an issue to explore options for addressing this need within BuildPulse.

Thanks again for sharing your observationsl

ekohl · May 31, 2019, 4:59pm

That’s a very good analysis of the problem and sounds like a valid solution.

jasonrudolph · June 11, 2019, 11:26pm

@ekohl: Following up on this discussion, BuildPulse now supports the ability to ignore CI specific types of CI checks. BuildPulse will now ignore the prprocessor checks when monitoring for flaky builds in @theforeman’s repositories and @Katello’s repositories.

If you have any additional CI checks that should be ignored, just let me know, and I’ll happily add them for you. In the short term, I’ll handle this kind of configuration manually via support requests. Longer term, it may very well be something that teams are able to configure on their own.

Thanks again for sharing the feedback that revealed the need for this improvement!

ekohl · June 12, 2019, 1:39pm

Thanks for that! Some more feedback, this time about the UI

When I open the app, I see 2 organizations: Katello and theforeman. However, when I click through Katello is empty and in theforeman I see one repository. The pages feel very empty and perhaps this can be slightly optimized by merging it into one (low priority).

The Foreman repository now has 100% stable builds (see below) and that means I have no controls that do anything.

Perhaps the organization / repository part could be links?

I also wondered about other dates so perhaps it needs a date picker if that data is available?

I’ll also add some repositories so there’s a bit more data.

Longer term I wonder if it’s possible to dive into test results. I know this is hard because you want to integrate with anything and there’s no way to upload test results into Github, but it’d be great to automatically identify specific flaky tests.

jasonrudolph · June 12, 2019, 11:57pm

Thanks, @ekohl! This is helpful!

When I open the app, I see 2 organizations: Katello and theforeman. However, when I click through Katello is empty and in theforeman I see one repository.

Thanks for pointing this out! Currently, BuildPulse is installed on theforeman organization with access to one repository (theforeman/foreman) and on the Katello organization with access to one repository (Katello/katello). With that in mind, it makes sense to me that you see one repository under theforeman organization, but I would have expected you to also see one repository under the Katello organization.

BuildPulse (and all GitHub apps) use this API endpoint to determine which repositories a user can access. The endpoint returns “repositories that the authenticated user has explicit permission (:read, :write, or :admin) to access for an installation.” When fetching the list of accessible repositories for your user account for the Katello installation, the API returns an empty array. From what I see so far, I think your account might be lacking explicit permission to Katello/katello. You at least have implicit read access to it, because it’s a public repository. To help figure out whether I’m interpreting this situation correctly, are you able to see what permissions you have for Katello/katello on GitHub?

The pages feel very empty and perhaps this can be slightly optimized by merging it into one (low priority).

You’re right: These pages are so barebones right now. Providing a richer list view is definitely something I want to do.

The Foreman repository now has 100% stable builds (see below) and that means I have no controls that do anything. Perhaps the organization / repository part could be links? I also wondered about other dates so perhaps it needs a date picker if that data is available?

Congratulations on 100% stable builds!

I agree that it would help for the page to provide more interactivity, and selecting various date ranges seems really useful to me.

I’ll also add some repositories so there’s a bit more data.

Cool! So far, I’ve focused on the experience for a repository that already has at least a week’s worth of flaky build analysis, and I haven’t yet focused on the “blank slate” experience. When you add more repositories, BuildPulse will need to monitor commit statuses for a few days before you’ll start seeing a useful UI. So at first, you’ll see a quite underwhelming blank slate, but after about a week, you should have a richer UI like you currently see for theforeman/foreman.

Longer term I wonder if it’s possible to dive into test results. … it’d be great to automatically identify specific flaky tests.

I agree wholeheartedly! I think that would make a big impact for teams that are battling flaky tests.

jasonrudolph · July 19, 2019, 12:55pm

Hi folks: A quick update on the latest BuildPulse release: You can now receive weekly reports to keep you up to date on the stability/flakiness trends on your projects. Instead of having to remember to periodically check the app to see how things are going, you can get an automated report in your inbox each week.

landing-page-weekly-email

The next time you sign in to https://buildpulse.io, BuildPulse will automatically enable these reports for you, and you can expect to see them in your inbox each Monday. And of course, if you’d rather not receive them, you can unsubscribe at any time with just one click.

I hope this is helpful. If you have any feedback any time, I’d love to hear it.

jasonrudolph · August 7, 2019, 12:38pm

Hi @ekohl: Just a quick follow-up to share some improvements based on your feedback:

A richer repository list view

You mentioned some suggestions for improving the repository list view.

As a first iteration on improving these pages, you’ll now see the stability percentage for each repository directly in the repository list, and you’ll also see an indication of when the repository was last built:

You mentioned providing a merged view that provides a single list of all the repositories you can see across all organizations, and that’s something I’ll definitely keep in mind.

Access to Katello/katello

You mentioned that you could see the Katello organization in the organization list, but that when you clicked on the Katello link, the list of repositories was empty:

This should be resolved now. Because Katello/katello is a public repository, and because you have explicit permission to access to the Katello organization, BuildPulse will now include Katello/katello in the list that you see.

Thanks for taking the time to share your thoughtful feedback! I appreciate it, and I’d love to hear any further feedback that comes to mind.

ekohl · October 16, 2019, 2:09pm

After evaluating it, I’m leaning to removing buildpulse. To me it doesn’t really add value since I never look at it. Now that it sends emails it’s actively becoming annoying. I’ve looked once or twice but now delete them without reading.

The main reason it doesn’t add value is that it only looks at services as a whole, but if we have a big Jenkins job that reports the entire test suite you don’t see what’s failing. The analysis is still manual. You also don’t know if the failure was caused by the author or really a flaky build.

It would really be useful if it would query Jenkins’ test results and name specific tests that fail. Having that list ranked by how often they fail would actually improve things. Since everything is in Jenkins anyway, we don’t need the indirection of and buildpulse Github though. It should be easily scriptable with just the Jenkins API.

jrudolph · October 16, 2019, 11:15pm

Thanks for this constructive feedback, @ekohl. I agree that there’s a lot of value in identifying flakiness at level of specific tests and making it easy to see which tests are the most frequent causes of intermittent failures. This is high priority on the BuildPulse roadmap.

In case it helps, there should be an “unsubscribe” link at the bottom of each email. I definitely wouldn’t want you to have to receive emails that you don’t find useful.