Improving PR merge velocity

tbrisker · February 24, 2019, 3:44pm

Hello,

Several developers have recently voiced their frustration at the amount of time it takes to get a PR merged. This issue is especially visible in Foreman core, where we have been standing at around 120 open prs for quite a while, but is also true for many of the other repos in the Foreman and Katello organizations. The purpose of this (rather long) post is to try and identify some possible causes for this and start a discussion on what steps we can consider to improve the situation.

I think it goes without saying that getting PRs merged faster will lead to better developer happiness and productivity, and value gained by our users by having features added and issues resolved quickly. For first time contributors, getting their contribution accepted quickly is a very important step to becoming returning contributors.

Some of the causes I’ve identified so far from what I’ve seen, feel free to add more if I missed something:

Foreman core team has 14 members. If we look at the committer statistics, we see that in the past two years, 76% of the commits were merged by just 5 of these, three of which don’t currently work on Foreman full time. As some people have moved on or taken additional responsibilities over the last few years, we have lost many core committers, with very few people who have stepped in to take this responsibility (we added @mmoll and @Shimon_Shtein in Dec '17 and @ekohl in July '18). In the same time frame, the number of developers working full or part time on foreman has grown significantly, producing more new PRs all the time.
This is also true for plugins - the number of plugins and people working on them continues to grow faster than the rate at which new maintainers are added.
Many of the repos have just one maintainer, who has other responsibilities and can’t review all PRs properly. In these situations, the maintainer also has a very hard time getting their own code reviewed, often resorting to merging their own code with no other eyes on it - sometimes leading to bugs that could have been caught by even a short review.
Maintainers are very reluctant to reject PRs, instead avoiding reviewing them until eventually the bot closes them out after half a year.
In other cases, reviewers may decide that the review cycle has dragged for long enough and compromise on quality to get a PR merged, possibly planning to fix up any issues in follow up PRs (which often never get the priority they need to be done), leading to increasing technical debt with no clear plan on how or when to pay it back.
The longer the review cycle(s) take, the more difficult it is for both reviewers and contributors to get back to the code and remember why certain things were done in a certain way or what was the requested change etc.
Some PRs touch areas in which no one has expertise (for example, they were written by past contributors who are no longer active), leading to fear of breaking something, or even not understanding the change, by reviewers. These can often be spotted as the PRs that stay open for a long time with no comments on them - as no reviewer feels comfortable enough reviewing them at all.
Bigger changes are harder to review. A while ago I saw a talk by Greg Kroah-Hartman, where he mentioned that they usually don’t accept patches longer than about 100 line change into the stable kernel. This got me thinking, and I’ve asked @Gwmngilfen to scientifically look into our data regarding the relation between size of the PR and the time-to-merge. He’ll share the results once he’s done, but my hypothesis is that the difficulty of reviewing a PR grows exponentially by the size of the changes.

Now that I’ve outlined some of the causes, and if you’ve made it this far, let’s look at some possible options as to how we can improve the situation:

Adopting a more liberal merge policy - for example, merging everything that passes tests. I don’t think this option is feasible, considering how critical stability is for our users and how many areas don’t have proper test coverage.
Encourage more people to do reviews. As mentioned in 1 above, very few people do the majority of reviews. I’ve checked several other larger repos and the situation is similar or even worse. Also, the number of non-committers who do many reviews is very small, even though that makes it easier for committers to actually accept patches (as they had already been reviewed), as well as making it possible for them to become committers eventually. I don’t really have any good idea how to promote this other than constantly reminding people to do it, suggestions are welcome.
Increase the number of committers in the various repos. To become a committer, we usually require people to do a certain amount of good reviews first. While we could become more lenient with this requirement, I’m concerned it may again lead to lower stability due to less experienced committers. Perhaps we could have some process in which people who are interested in becoming committers go through some sort of mentoring by experienced committers.
Speeding up reviews by doing them together with the contributor. In many cases, weeks of back-and-forth correspondence can be saved by sitting together (in person or by video) for half an hour and discussing the changes. Any comments raised during this review should be written down in case other reviewers are involved, but this can help the reviewer understand the change much better and save the repeated context switching and possible misunderstandings of written communication. I’ve experimented with doing this several times, and it has led to much faster turnaround and both sides understanding the code better and feeling better about the experience.
Splitting up larger PRs into smaller, atomic changes. A small PR will always be much easier to understand and review properly. In the past we’ve been reluctant to merge “partial” work, mostly due to the reasonable concern that it may lead to dead code that is never used, and also because in some cases doing a change gradually in several steps requires more code in total and more planning than doing a larger change all at once. For the first point, we could ask for the “next-steps” PRs to be open at the same time, or perhaps even better, not - it might actually save a lot of effort going down a certain path if during review of the first step a better path is found, and in fact save the effort of having to rewrite big parts of a PR following the review. For the second point, while the total coding effort might be bigger, the actual time to delivery would be shorter. Since I believe the time to merge is exponential to the size of the change, five 50-line changes would actually be merged much faster than one 200-line change. I believe this will also lead to less bugs, as many regressions we’ve had were a result of changes that are too large for any single person to properly understand and review them, and were just missed between the hundreds or thousands of lines in a single PR.
Limit the number of “in the air” prs for developers. This is obviously possible only for people paid to work on Foreman, but I know some teams have experimented with placing certain limits on the number of PRs allowed in waiting for review, trying by that to encourage their members to put more effort into making sure their work is reviewed and merged before they start something new. I’d be happy to hear from those teams how this worked out for them.
Have specific times (perhaps parts of the release cycle?) dedicated to merging PRs - a week or more where all developers are requested to only work on getting open PRs merged and not create any new ones. I think we’ve had a couple of times where we tried doing such “merge parties” in the past, but never for more than a day AFAIR. Perhaps we could try doing this for a week and see how that goes.
Be more explicit in closing out PRs that we don’t feel have a possibility of being merged, instead of leaving them open for the bot to close. While this may lead to hard feelings on the contributors’ side, so does having your PR open and ignored for half a year until the bot comtes. It may be better, like a band-aid, to close it quickly rather than spend several cycles of reviews and effort trying to improve it only to eventually let it rot for the bot. If we decide to do this, we need to find a way to do so with the needed sensitivity.
The above point is also true for issues on our tracker - we don’t close any requests right now, and this leads to many open issues that will never be fixed, or people implementing fixes that perhaps shouldn’t have been implemented at all, just because they are in the tracker.

Wow, this came out much longer than I initially expected, thank you for reading all of this!

If you have any further suggestions or disagree with anything I’ve written above, please add your comments below. Please also reply with what of the above options you think we should go for.
Pinging @core and @katello - since you are doing much of the reviews, your opinions are especially important in this discussion.

lzap · February 25, 2019, 4:29pm

Well said. Few words mostly for full-time devs who hesitate to start reviewing:

Don’t treat time spent on review as time you are not working. It’s actually the opposite, reviews are important and the time invested is multiplied by the time spent by the author of the work.
Do not hesitate to review patches you don’t necessary understand. The key thing is testing not reading the code. It’s much better if you merge some code that does not look nice, can be improved/refactored or there’s a typo in method name than if you merge a patch that:
- Does not work.
- Does not fix the bug or implement what’s said in the issue.
- Creates a regression.
Merging is a responsibility, yes, but you are not alone. If you break master/develop branch, we will help you! The original author will help. It’s bunch of us to the rescue. In the worst case we can revert a patch and it’s not the end of the world. Happens to the best of us
All you really need to test PRs is a working Foreman/Katello instance and some time to test it and understand what it should do.
Sometimes you need to seed your database with data or access to a particular compute resource. Just ask others in this case, there are several ways to achieve reasonable solutions to that.
If I remember correctly, Forklift has a job/process that you feed with a PR number and it will actually create you a working instance with the PR applied.

ekohl · February 25, 2019, 4:48pm

and I’d also say that I focus on the things I don’t understand. If I don’t understand it, it can mean I’m stupid and missed something. In this case I learn. It can also mean that it lacks documentation or is simply a bug. When you’re reading this now, there’s also a big chance other developers have this issue when they read the code later on. After it’s merged, you’re less likely to change it.

Reviewing is a process of collaboration. You learn from each other. Don’t worry too much about asking stupid questions because most questions aren’t stupid.

AFAIK this is only true for installer PRs. Testing PRs using RPM packages is still open.

Marek_Hulan · February 26, 2019, 5:34pm

It’s hard to reply to such long post as I’m sure I’ll miss something I agree with we should (constantly) encourage more people to review, even if they don’t have commit access. I agree that doing a review with the contributor is much more efficient. And I agree with being more explicit with closing. So for options 2, 4, 8.

The ones I would challenge are 5,6.

For 5 I think there are some PRs, that will be big. I think it’s good to know that even big changes can get it. E.g. the templates refactoring. Imagine at some point we’d like to rewrite taxonomies, that would be huge change. I agree it’s better to split to smaller things if possible, but guidelines on big PRs would be also good. E.g. splitting in separate commits, try to get commitment from the reviewer in the beginning so they can follow the evolution of the code, communicate the design early before the implementation.

For 6, I don’t think I should be limited by the fact, some of my PRs are open, I believe valid but no one reviewed them.

I think for option 7 I also see, there’s a potential for “committers” and reviewers better organizing their work. I tend to pick PRs that I think I can understand. And that’s not good, since I’m not learning new things and I’m relying on reviewer $x to always review PRs from area $y. Perhaps we should have a dashboard of PRs that are open longer than 2 weeks with no reviewer assigned and we should assign someone for the review. And we should meet (e.g. just on irc) and talk who has capacity to review next PR in the queue and commit to it.

On thing I’d also like to mention is, we should let people know, they can ask. I don’t think many people know they can “ping @theforeman/core”. Also there’s no process for what happens if no one replies after such ping. Typically we struggle in areas that are not well known by many, e.g. react code base, authorization, integration with external services. So we should talk about how to address this.

Thanks for bringing this topic up Tomer!

TimoGoebel · February 27, 2019, 8:41am

Thanks (again), Tomer for bringing this up. I believe I’m one of the people complaining about long merge times and probably should have brought this up myself.

PRs that aren’t reviewed is wasted effort in my opinion. And that’s something we as a project cannot afford.

I think both Tomer and Marek have identified most of the pain points.
I’d like to add two more:

Packaging causes delays. I think we should introduce a better process than pinging Ewoud and Evegni on Github. Maybe work with labels or something. And I’d like to get rid of code owners. That’s too strict for my liking. The bot could just add a labels “Needs packaging” and “Waiting on packaging” and “Packaging done”, so it’s not the reviewers/committers job to micro manage the packaging effort.
Testing is hard and is a larger barrier. I’m wondering if we could spin up a demo box from a PR so you can test a PR even without having your own dev environment.

I’ll reply with a full answer soon, I’m in a bit of a hurry currently.

lzap · February 27, 2019, 2:52pm

I like it, I was playing with an idea of a demo site that would reset every midnight accessible even for public community to look around and test things. We’d need to load it with some reasonable (fake) data but that’s something we can do via hammer/API.

TimoGoebel · February 28, 2019, 9:23am

Sorry for adding my comments in pieces, but maybe it’s actually a good thing because it makes them easier to digest.

In addition, I think we should add a CI job for every PR so we see if the PR has an effect on packaging. I believe this will help with transparency and a developer can easily validate if required changes in packaging actually work with the core PR after they have been merged in packaging.

I guess we all agree that packaging is a vital part in shipping the product, it’s part of our definition of done. So let’s add a guard to our PR ci process.

Another thing I’d really like to get rid of is the Katello ci job. Most of the time it’s red anyways. I think we should add a test for the example plugin instead as it uses the main plugin API endpoints, tests should be a lot faster and and does not override any vital core features. In addition most of the plugins are based on the example plugin so, it’s a good cross-section of all the plugins out there. And all core developers have merge access to the example plugin so they can make changes fast when necessary. And the example plugin is easy enough to understand and does not need a special dev environment to make changes.

tbrisker · February 28, 2019, 10:14am

Thanks for the input @TimoGoebel! one comment:

While I agree it’s annoying that katello tests are unstable, I don’t think this actually prevents code from being merged quickly - it take an extra minute from the reviewer to check the output, but i think all core reviewers are already familiar with the intermittent katello failures and ignore them.
I would much rather replace the katello test with a test that tests a bunch of plugins together - that would give us a better idea of when we break plugins with a change. I don’t think red plugins test should necessarily be blocking merges, but they will give awareness and allow both the contributor and the plugin maintainers know that something changed and needs to be fixed.

TimoGoebel · February 28, 2019, 10:18am

I always ignore the Katello test, so yes. That’s not what’s time consuming. I believe having a test with a stable (example plugin, under the responsibility of the same maintainers) can actually increase confidence in the patch and cause faster merge times.

Marek_Hulan · February 28, 2019, 12:24pm

And would force plugin maintainers to add proper plugin DSL and extension points instead of doing monkeypatches. I like that idea. For quick core tests, this is a good compromise.

ehelms · March 6, 2019, 11:24pm

This can generally help. I’d say my experience is that folks need to feel a sense of ownership and tied to that is a reason to feel ownership (e.g. related to what they are working on, code they see or use everyday). This can be hard to achieve when code sprawls across multiple repositories. I know plugins are our model for adding types of functionality but they could be hurting development efforts by splitting the focus of developers across many properties.

If there are ways to achieve this with the contributor I completely agree. In an async working world, this can be hard to achieve.

+1000 – A smaller, focused PR is easier for both the reviewer to understand and spend time on and for the contributor to make corresponding changes to. Certainly number of lines is a good metric as it indicates how much code had to be written and how much has to be read. However, I have found that conceptual sizing is more important. For example, if a change requires model, API and UI updates. Attempt to split this change up into a model PR or model + API PR and a separate UI PR. This could be less intimidating as a reviewer who feels comfortable with models and database may not feel comfortable with the UI portion and vice versa.

I say let developers handle their own number of PRs and work in progress as we all work differently. Related to this is the things that lead developers to have multiple PRs in progress. For myself and some others I have talked to, when I get blocked on something (waiting on review or some other factor) I move on to another.

I’d be interested to know what the lag-time between PR open and first review are compared to amount of time PR stays open. As well, time between comments to updated commits to comments lag.

Not a bad idea honestly. If everyone is focused on PR update - review cycle then the liklihood goes up that a week of that will reduce overall numbers prior to releases.

I tend to lean towards being gardeners vs. gatekeepers but if something is honestly not going to get in due to design, or scope. We should be honest, explain that and close out PRs.

I’ll add a number 10 and 11 from my perspective.

Look across PRs to see how often reviewers are working on giving contributors a “path to contribution”. Instead of constant back and forth, early on stating to a contributor do X,Y and Z and this PR will be good to go. That gives them an end to aim for, a contract to achieve. And if after that something else comes up the reviewer thinks of, that reviewer should apologize and point out what additional changes are needed and why. Or, the reviewer should opt to push that additional change to a follow up PR per the ‘contract’.
Are there development tool or architectural changes we can make to our app and plugins to enable easier contribution?

John_Mitsch · March 7, 2019, 5:00pm

A big +1 to this topic and I like many of the changes suggested

This ties in with something I’ve been thinking about. Having automation running actual user workflows on an actual foreman/katello instance on the PR level would help give confidence to reviewers when merging PRs.

I find myself repeating the same workflows very often when reviewing PRs.

We have improved our testing infrastructure quite a bit and I think there is more of a possiblity to run something similar to bats testing on the PR level. I know there is talk of UI automation as well. These are bigger changes that require more discussion and I know there are limitations, but I think it would improve PR merge velocity in addition to the changes being discussed.

evgeni · March 11, 2019, 11:49am

(only replying to two sub-topics, +1 on the rest of the existing discussion)

I like that idea, especially as I tried something similar recently downstream in our Satellite team

Let me dump a few thoughts (not particularly only mine):

full week might be too long/scary for some in terms of “not doing other work”
(bi-weekly?) “review day” might be an interesting compromise between “not doing coding work” and “having dedicated time” for reviews
review timeframe should not mean that we don’t do reviews all the other time, this is more to tackle old/big/otherwise underreviewed pieces
gamification/stats is always interesting, how about achievements like “helped review the oldest PR ever” or “… the biggest …”?

I’d be down for trying out a “Review Thursday” every other week, just to see how that goes.

+1000

Adding a step in our tests that analyzes changes to bundler.d and checks them against foreman-packaging is probably quite easy to write and should add visibility quite nicely.

ekohl · March 11, 2019, 11:58am

To limit the numer of “in the air PRs” it may be good to use the draft PRs. This indicates to reviewers it’s not ready for review. It also doesn’t run CI. This can be great to share work with a small number of people. In search you can filter them out with -is:draft.

ekohl · March 11, 2019, 12:02pm

This right here. A lot of big projects do this. If you know there’s no regression (because a test proved it) then I’m much more comfortable merging. Right now I do a lot of mental gymnastics to figure out if it would break a use case. Automation can reduce this.

My solution is almost there. It does need a git diff somewhere so it doesn’t submit a PR with just a revbump.

github.com

ekohl/theforeman-plumbing/blob/699e8b49deb12c40a9e4bc1b412e2cb180d294fd/fm-packaging-update-deps#L24-L56


      
          update_gem_deps() {
          	local checkout="$1"
          	local spec_file="$2"
          
          	if [[ -f "${checkout}/Gemfile" ]] ; then
          		$PACKAGING_DIR/get-gemfile-deps "${checkout}/Gemfile" | $PACKAGING_DIR/update-requirements specfile - "$spec_file"
          	fi
          }
          
          update_npm_deps() {
          	local checkout="$1"
          	local spec_file="$2"
          
          	if [[ -f "${checkout}/package.json" ]] ; then
          		$PACKAGING_DIR/update-requirements npm "${checkout}/package.json" "$spec_file"
          	fi
          }
          
          update_foreman_spec() {
          	local checkout="${CLONE_DIR}/foreman"

This file has been truncated. show original

ehelms · March 12, 2019, 4:48pm

We could scratch build the RPM and perform a repoclosure with a status on the PR job. That leaves little room for doubt as we would test the actual thing we care about.

ekohl · March 12, 2019, 4:56pm

This was actually my plan. The first steps (build an automated way to convert dependencies) is present and finished. The next step was to get this into the nightly process.

A separate step which I outlined in Testing PRs using RPM packages was building a job in Jenkins so you can say [scratch rpm] to get a koji job ID with the changes included. This job could be triggered automatically after the entire test matrix completed successfully. We could do a repoclosure check. The forklift work is ready so you can easily spin up a pipeline with it as well.