Several developers have recently voiced their frustration at the amount of time it takes to get a PR merged. This issue is especially visible in Foreman core, where we have been standing at around 120 open prs for quite a while, but is also true for many of the other repos in the Foreman and Katello organizations. The purpose of this (rather long) post is to try and identify some possible causes for this and start a discussion on what steps we can consider to improve the situation.
I think it goes without saying that getting PRs merged faster will lead to better developer happiness and productivity, and value gained by our users by having features added and issues resolved quickly. For first time contributors, getting their contribution accepted quickly is a very important step to becoming returning contributors.
Some of the causes Iāve identified so far from what Iāve seen, feel free to add more if I missed something:
Foreman core team has 14 members. If we look at the committer statistics, we see that in the past two years, 76% of the commits were merged by just 5 of these, three of which donāt currently work on Foreman full time. As some people have moved on or taken additional responsibilities over the last few years, we have lost many core committers, with very few people who have stepped in to take this responsibility (we added @mmoll and @Shimon_Shtein in Dec '17 and @ekohl in July '18). In the same time frame, the number of developers working full or part time on foreman has grown significantly, producing more new PRs all the time.
This is also true for plugins - the number of plugins and people working on them continues to grow faster than the rate at which new maintainers are added.
Many of the repos have just one maintainer, who has other responsibilities and canāt review all PRs properly. In these situations, the maintainer also has a very hard time getting their own code reviewed, often resorting to merging their own code with no other eyes on it - sometimes leading to bugs that could have been caught by even a short review.
Maintainers are very reluctant to reject PRs, instead avoiding reviewing them until eventually the bot closes them out after half a year.
In other cases, reviewers may decide that the review cycle has dragged for long enough and compromise on quality to get a PR merged, possibly planning to fix up any issues in follow up PRs (which often never get the priority they need to be done), leading to increasing technical debt with no clear plan on how or when to pay it back.
The longer the review cycle(s) take, the more difficult it is for both reviewers and contributors to get back to the code and remember why certain things were done in a certain way or what was the requested change etc.
Some PRs touch areas in which no one has expertise (for example, they were written by past contributors who are no longer active), leading to fear of breaking something, or even not understanding the change, by reviewers. These can often be spotted as the PRs that stay open for a long time with no comments on them - as no reviewer feels comfortable enough reviewing them at all.
Bigger changes are harder to review. A while ago I saw a talk by Greg Kroah-Hartman, where he mentioned that they usually donāt accept patches longer than about 100 line change into the stable kernel. This got me thinking, and Iāve asked @Gwmngilfen to scientifically look into our data regarding the relation between size of the PR and the time-to-merge. Heāll share the results once heās done, but my hypothesis is that the difficulty of reviewing a PR grows exponentially by the size of the changes.
Now that Iāve outlined some of the causes, and if youāve made it this far, letās look at some possible options as to how we can improve the situation:
Adopting a more liberal merge policy - for example, merging everything that passes tests. I donāt think this option is feasible, considering how critical stability is for our users and how many areas donāt have proper test coverage.
Encourage more people to do reviews. As mentioned in 1 above, very few people do the majority of reviews. Iāve checked several other larger repos and the situation is similar or even worse. Also, the number of non-committers who do many reviews is very small, even though that makes it easier for committers to actually accept patches (as they had already been reviewed), as well as making it possible for them to become committers eventually. I donāt really have any good idea how to promote this other than constantly reminding people to do it, suggestions are welcome.
Increase the number of committers in the various repos. To become a committer, we usually require people to do a certain amount of good reviews first. While we could become more lenient with this requirement, Iām concerned it may again lead to lower stability due to less experienced committers. Perhaps we could have some process in which people who are interested in becoming committers go through some sort of mentoring by experienced committers.
Speeding up reviews by doing them together with the contributor. In many cases, weeks of back-and-forth correspondence can be saved by sitting together (in person or by video) for half an hour and discussing the changes. Any comments raised during this review should be written down in case other reviewers are involved, but this can help the reviewer understand the change much better and save the repeated context switching and possible misunderstandings of written communication. Iāve experimented with doing this several times, and it has led to much faster turnaround and both sides understanding the code better and feeling better about the experience.
Splitting up larger PRs into smaller, atomic changes. A small PR will always be much easier to understand and review properly. In the past weāve been reluctant to merge āpartialā work, mostly due to the reasonable concern that it may lead to dead code that is never used, and also because in some cases doing a change gradually in several steps requires more code in total and more planning than doing a larger change all at once. For the first point, we could ask for the ānext-stepsā PRs to be open at the same time, or perhaps even better, not - it might actually save a lot of effort going down a certain path if during review of the first step a better path is found, and in fact save the effort of having to rewrite big parts of a PR following the review. For the second point, while the total coding effort might be bigger, the actual time to delivery would be shorter. Since I believe the time to merge is exponential to the size of the change, five 50-line changes would actually be merged much faster than one 200-line change. I believe this will also lead to less bugs, as many regressions weāve had were a result of changes that are too large for any single person to properly understand and review them, and were just missed between the hundreds or thousands of lines in a single PR.
Limit the number of āin the airā prs for developers. This is obviously possible only for people paid to work on Foreman, but I know some teams have experimented with placing certain limits on the number of PRs allowed in waiting for review, trying by that to encourage their members to put more effort into making sure their work is reviewed and merged before they start something new. Iād be happy to hear from those teams how this worked out for them.
Have specific times (perhaps parts of the release cycle?) dedicated to merging PRs - a week or more where all developers are requested to only work on getting open PRs merged and not create any new ones. I think weāve had a couple of times where we tried doing such āmerge partiesā in the past, but never for more than a day AFAIR. Perhaps we could try doing this for a week and see how that goes.
Be more explicit in closing out PRs that we donāt feel have a possibility of being merged, instead of leaving them open for the bot to close. While this may lead to hard feelings on the contributorsā side, so does having your PR open and ignored for half a year until the bot comtes. It may be better, like a band-aid, to close it quickly rather than spend several cycles of reviews and effort trying to improve it only to eventually let it rot for the bot. If we decide to do this, we need to find a way to do so with the needed sensitivity.
The above point is also true for issues on our tracker - we donāt close any requests right now, and this leads to many open issues that will never be fixed, or people implementing fixes that perhaps shouldnāt have been implemented at all, just because they are in the tracker.
Wow, this came out much longer than I initially expected, thank you for reading all of this!
If you have any further suggestions or disagree with anything Iāve written above, please add your comments below. Please also reply with what of the above options you think we should go for.
Pinging @core and @katello - since you are doing much of the reviews, your opinions are especially important in this discussion.
Well said. Few words mostly for full-time devs who hesitate to start reviewing:
Donāt treat time spent on review as time you are not working. Itās actually the opposite, reviews are important and the time invested is multiplied by the time spent by the author of the work.
Do not hesitate to review patches you donāt necessary understand. The key thing is testing not reading the code. Itās much better if you merge some code that does not look nice, can be improved/refactored or thereās a typo in method name than if you merge a patch that:
Does not work.
Does not fix the bug or implement whatās said in the issue.
Creates a regression.
Merging is a responsibility, yes, but you are not alone. If you break master/develop branch, we will help you! The original author will help. Itās bunch of us to the rescue. In the worst case we can revert a patch and itās not the end of the world. Happens to the best of us
All you really need to test PRs is a working Foreman/Katello instance and some time to test it and understand what it should do.
Sometimes you need to seed your database with data or access to a particular compute resource. Just ask others in this case, there are several ways to achieve reasonable solutions to that.
If I remember correctly, Forklift has a job/process that you feed with a PR number and it will actually create you a working instance with the PR applied.
and Iād also say that I focus on the things I donāt understand. If I donāt understand it, it can mean Iām stupid and missed something. In this case I learn. It can also mean that it lacks documentation or is simply a bug. When youāre reading this now, thereās also a big chance other developers have this issue when they read the code later on. After itās merged, youāre less likely to change it.
Reviewing is a process of collaboration. You learn from each other. Donāt worry too much about asking stupid questions because most questions arenāt stupid.
Itās hard to reply to such long post as Iām sure Iāll miss something I agree with we should (constantly) encourage more people to review, even if they donāt have commit access. I agree that doing a review with the contributor is much more efficient. And I agree with being more explicit with closing. So for options 2, 4, 8.
The ones I would challenge are 5,6.
For 5 I think there are some PRs, that will be big. I think itās good to know that even big changes can get it. E.g. the templates refactoring. Imagine at some point weād like to rewrite taxonomies, that would be huge change. I agree itās better to split to smaller things if possible, but guidelines on big PRs would be also good. E.g. splitting in separate commits, try to get commitment from the reviewer in the beginning so they can follow the evolution of the code, communicate the design early before the implementation.
For 6, I donāt think I should be limited by the fact, some of my PRs are open, I believe valid but no one reviewed them.
I think for option 7 I also see, thereās a potential for ācommittersā and reviewers better organizing their work. I tend to pick PRs that I think I can understand. And thatās not good, since Iām not learning new things and Iām relying on reviewer $x to always review PRs from area $y. Perhaps we should have a dashboard of PRs that are open longer than 2 weeks with no reviewer assigned and we should assign someone for the review. And we should meet (e.g. just on irc) and talk who has capacity to review next PR in the queue and commit to it.
On thing Iād also like to mention is, we should let people know, they can ask. I donāt think many people know they can āping @theforeman/coreā. Also thereās no process for what happens if no one replies after such ping. Typically we struggle in areas that are not well known by many, e.g. react code base, authorization, integration with external services. So we should talk about how to address this.
Thanks (again), Tomer for bringing this up. I believe Iām one of the people complaining about long merge times and probably should have brought this up myself.
PRs that arenāt reviewed is wasted effort in my opinion. And thatās something we as a project cannot afford.
I think both Tomer and Marek have identified most of the pain points.
Iād like to add two more:
Packaging causes delays. I think we should introduce a better process than pinging Ewoud and Evegni on Github. Maybe work with labels or something. And Iād like to get rid of code owners. Thatās too strict for my liking. The bot could just add a labels āNeeds packagingā and āWaiting on packagingā and āPackaging doneā, so itās not the reviewers/committers job to micro manage the packaging effort.
Testing is hard and is a larger barrier. Iām wondering if we could spin up a demo box from a PR so you can test a PR even without having your own dev environment.
Iāll reply with a full answer soon, Iām in a bit of a hurry currently.
I like it, I was playing with an idea of a demo site that would reset every midnight accessible even for public community to look around and test things. Weād need to load it with some reasonable (fake) data but thatās something we can do via hammer/API.
Sorry for adding my comments in pieces, but maybe itās actually a good thing because it makes them easier to digest.
In addition, I think we should add a CI job for every PR so we see if the PR has an effect on packaging. I believe this will help with transparency and a developer can easily validate if required changes in packaging actually work with the core PR after they have been merged in packaging.
I guess we all agree that packaging is a vital part in shipping the product, itās part of our definition of done. So letās add a guard to our PR ci process.
Another thing Iād really like to get rid of is the Katello ci job. Most of the time itās red anyways. I think we should add a test for the example plugin instead as it uses the main plugin API endpoints, tests should be a lot faster and and does not override any vital core features. In addition most of the plugins are based on the example plugin so, itās a good cross-section of all the plugins out there. And all core developers have merge access to the example plugin so they can make changes fast when necessary. And the example plugin is easy enough to understand and does not need a special dev environment to make changes.
While I agree itās annoying that katello tests are unstable, I donāt think this actually prevents code from being merged quickly - it take an extra minute from the reviewer to check the output, but i think all core reviewers are already familiar with the intermittent katello failures and ignore them.
I would much rather replace the katello test with a test that tests a bunch of plugins together - that would give us a better idea of when we break plugins with a change. I donāt think red plugins test should necessarily be blocking merges, but they will give awareness and allow both the contributor and the plugin maintainers know that something changed and needs to be fixed.
I always ignore the Katello test, so yes. Thatās not whatās time consuming. I believe having a test with a stable (example plugin, under the responsibility of the same maintainers) can actually increase confidence in the patch and cause faster merge times.
And would force plugin maintainers to add proper plugin DSL and extension points instead of doing monkeypatches. I like that idea. For quick core tests, this is a good compromise.
This can generally help. Iād say my experience is that folks need to feel a sense of ownership and tied to that is a reason to feel ownership (e.g. related to what they are working on, code they see or use everyday). This can be hard to achieve when code sprawls across multiple repositories. I know plugins are our model for adding types of functionality but they could be hurting development efforts by splitting the focus of developers across many properties.
If there are ways to achieve this with the contributor I completely agree. In an async working world, this can be hard to achieve.
+1000 ā A smaller, focused PR is easier for both the reviewer to understand and spend time on and for the contributor to make corresponding changes to. Certainly number of lines is a good metric as it indicates how much code had to be written and how much has to be read. However, I have found that conceptual sizing is more important. For example, if a change requires model, API and UI updates. Attempt to split this change up into a model PR or model + API PR and a separate UI PR. This could be less intimidating as a reviewer who feels comfortable with models and database may not feel comfortable with the UI portion and vice versa.
I say let developers handle their own number of PRs and work in progress as we all work differently. Related to this is the things that lead developers to have multiple PRs in progress. For myself and some others I have talked to, when I get blocked on something (waiting on review or some other factor) I move on to another.
Iād be interested to know what the lag-time between PR open and first review are compared to amount of time PR stays open. As well, time between comments to updated commits to comments lag.
Not a bad idea honestly. If everyone is focused on PR update - review cycle then the liklihood goes up that a week of that will reduce overall numbers prior to releases.
I tend to lean towards being gardeners vs. gatekeepers but if something is honestly not going to get in due to design, or scope. We should be honest, explain that and close out PRs.
Iāll add a number 10 and 11 from my perspective.
Look across PRs to see how often reviewers are working on giving contributors a āpath to contributionā. Instead of constant back and forth, early on stating to a contributor do X,Y and Z and this PR will be good to go. That gives them an end to aim for, a contract to achieve. And if after that something else comes up the reviewer thinks of, that reviewer should apologize and point out what additional changes are needed and why. Or, the reviewer should opt to push that additional change to a follow up PR per the ācontractā.
Are there development tool or architectural changes we can make to our app and plugins to enable easier contribution?
A big +1 to this topic and I like many of the changes suggested
This ties in with something Iāve been thinking about. Having automation running actual user workflows on an actual foreman/katello instance on the PR level would help give confidence to reviewers when merging PRs.
I find myself repeating the same workflows very often when reviewing PRs.
We have improved our testing infrastructure quite a bit and I think there is more of a possiblity to run something similar to bats testing on the PR level. I know there is talk of UI automation as well. These are bigger changes that require more discussion and I know there are limitations, but I think it would improve PR merge velocity in addition to the changes being discussed.
(only replying to two sub-topics, +1 on the rest of the existing discussion)
I like that idea, especially as I tried something similar recently downstream in our Satellite team
Let me dump a few thoughts (not particularly only mine):
full week might be too long/scary for some in terms of ānot doing other workā
(bi-weekly?) āreview dayā might be an interesting compromise between ānot doing coding workā and āhaving dedicated timeā for reviews
review timeframe should not mean that we donāt do reviews all the other time, this is more to tackle old/big/otherwise underreviewed pieces
gamification/stats is always interesting, how about achievements like āhelped review the oldest PR everā or āā¦ the biggest ā¦ā?
Iād be down for trying out a āReview Thursdayā every other week, just to see how that goes.
+1000
Adding a step in our tests that analyzes changes to bundler.d and checks them against foreman-packaging is probably quite easy to write and should add visibility quite nicely.
To limit the numer of āin the air PRsā it may be good to use the draft PRs. This indicates to reviewers itās not ready for review. It also doesnāt run CI. This can be great to share work with a small number of people. In search you can filter them out with -is:draft.
This right here. A lot of big projects do this. If you know thereās no regression (because a test proved it) then Iām much more comfortable merging. Right now I do a lot of mental gymnastics to figure out if it would break a use case. Automation can reduce this.
My solution is almost there. It does need a git diff somewhere so it doesnāt submit a PR with just a revbump.
We could scratch build the RPM and perform a repoclosure with a status on the PR job. That leaves little room for doubt as we would test the actual thing we care about.
This was actually my plan. The first steps (build an automated way to convert dependencies) is present and finished. The next step was to get this into the nightly process.
A separate step which I outlined in Testing PRs using RPM packages was building a job in Jenkins so you can say [scratch rpm] to get a koji job ID with the changes included. This job could be triggered automatically after the entire test matrix completed successfully. We could do a repoclosure check. The forklift work is ready so you can easily spin up a pipeline with it as well.