Releases and nightlies stabilization effort

Marek_Hulan · June 15, 2018, 10:04am

Hello,

As you all probably know, our recent releases were delayed and some even contained critical bugs that we missed. We also very often have broken nightlies, last week the assets issue was fixed there, but still all plugin pages with react components are broken. Looking at results of the community survey, it’s clear that the community sees this (see the Stability chart) and wants us to fix bugs (see the What Next chart). We already took some countermeasures to improve the situation, e.g. we have the list of sanity checks we perform on 1.18 RC1, reporting breakages on discourse etc, but we’d like to go further.

So, we’ve decided to make one person fully dedicated just to keep releases and nightly builds stable and delivered on time. We’ve found a volunteer for this new role and I believe that it’s not a big surprise who: it’s @tbrisker, who was partly doing it already.

He will be responsible for monitoring all user inputs, such as Redmine, Support category on Discourse, #theforeman on IRC, GitHub PRs, the security mailing list, and considering the impact of reported issues. If the reported issue is critical for the release, it’s his responsibility to make sure it’s fixed. Either himself or asking specific person for help. This is where we need your help, if Tomer asks you to fix an issue, review a PR or test a fix, please try to work on that with the highest priority.

Since he will also be responsible for stability of the release, we decided to split the current release nanny role into two: a release manager and a release engineer. The release manager will be responsible for sending branching announcements, enforcing the schedule, deciding what should be merged into core projects and what is to risky and should be left in develop branch. The release engineer will only focus on transforming the code to packages, basically following the rest of the release process. It’s obvious the release manager will be in touch with release engineer for core, and will also coordinate releases with the release engineers/maintainers of the main plugins. We’ll split this for 1.19, Tomer will be the release manager while @ekohl volunteered for release engineer role. 1.17 and 1.18 will continue to be maintained by @Ondrej_Prazak until they are no longer supported.

Last big area of responsibility is around nightlies. Tomer should ensure our nightlies are building and are usable, not just installable. If he figures out there’s some problem, he will alert us and ask people to help with fixing the breakage.

In order for this to be successful, I’d kindly ask everyone to help Tomer with anything he needs. Please share your thoughts and recommendations. I’m personally looking forward for outcomes of this and wish Tomer minimal amount of critical issues to resolve

dLobatog · June 15, 2018, 1:10pm

Best of luck @tbrisker - having you fully dedicated to this will surely make a big impact on our ecosystem.

TimoGoebel · June 18, 2018, 10:07am

I believe this is very good news. I’m looking forward to more stable releases and wish @tbrisker a good time in this role.

Two thoughts that I’d like to share:

In my opinion the PR list on foreman core is getting hard to navigate. Should we introduce some more labels to make PRs stick out? I’d really like priority/important & priority/critical-urgent labels to mark important PRs that need attention.

image1470×104 18.7 KB
How are we going to handle cherry-picks to stable releases? In the past, it was enough to mark the desired release in Redmine. The cherry-picks were done by the release manager. In the last couple of weeks, we have started doing cherry-pick PRs for every branch and commit separately. It made sure the fix was cherry-picked, but it further made the PR list harder to navigate.

tbrisker · June 18, 2018, 1:12pm

I’m definitely open to suggestions regarding this, i tried to implement target milestones as indication of PRs that are blockers for a certain version, but we can think of other methods as well. Another thing is that the bot that used to close stale prs after 6 months inactivity seems to have stopped working, so perhaps there are a bunch that can be auto-closed if someone fixes that.

The main benefit of opening a cherry-pick PR is that we get tests to run on it, but perhaps this isn’t always needed? Open to feedback here.

ekohl · June 18, 2018, 1:28pm

I like running tests. Perhaps we could automatically label it with the prprocessor? In foreman-packaging we already label on rpm/deb depending on the branch. We can probably do the exact same thing with .*-stable => cherry-pick.

tbrisker · June 18, 2018, 2:49pm

definite from me for adding more automation to the PR bot, but regarding the CP-PRs, if we only open them for tests to run, I think it makes sense that the release engineer/manager can also merge their own code/pr in this context once the tests are green, so we can merge them faster.

ekohl · June 18, 2018, 2:50pm

I agree that release manager can merge their own PRs but having the label will allow others to skip them.

Gwmngilfen · June 19, 2018, 9:51am

more labels == easier analysis work so from me

lzap · June 19, 2018, 12:31pm

More labels on github would improve my PR picking process a lot. Not only release-process related labels but also components or areas. This will let me re-review already unsubscribed PRs easily and return back to those which obviously stalled and I know I could help.

TimoGoebel · June 19, 2018, 12:32pm

All right, if you guys don’t mind I’ll open a new thread with some suggestions.

dLobatog · June 19, 2018, 1:37pm

We get tests running on every cherry-pick, with and without pull request.

A few versions ago, I started doing releases. Before making the release, I would go over all issues slanted for that version, and cherry-pick them all at the same time. Then I’d start the release. And a few hours after that, I’d figure out that release couldn’t work.

How did I figure out? http://ci.theforeman.org/job/test_1_18_stable/ and http://ci.theforeman.org/job/test_proxy_1_18_stable/ .

After you run a cherry-pick against a stable branch, those two jobs will run. If you do it through a PR, you ensure you’re not going to break it. If you cherry-pick without the PR, you might break it. It’s not a big deal to break these 2 jobs, they’re just a way to make sure your PRs work.

Once I found myself breaking these jobs a couple of times right before the release, I changed the strategy. It pays off to cherry-pick as soon as possible, as there’s more time to fix the stable branch if needed.

tl;dr - it’s alright to cherry-pick to stable branches directly in my experience, as long as you do it early (not 5 minutes before running the release pipeline). If you’re afraid of breaking the tests by cherry-picking directly, submit a PR

iNecas · June 22, 2018, 3:27pm

The biggest benefit of cherry-picks via PR I see is that it opens a way for the authors of the original patches to send the rebased version, instead of the release nanny, that might not have the full knowledge, doing so.

dLobatog · June 22, 2018, 3:48pm

In my experience 99% of the times there are no conflicts when doing these kind of cherry-picks, if there are any, of course it makes more sense to publish that