Foreman 3.9/Katello 4.11/Pulpcore 3.39 retro

This is the wiki for the retro that will be held for the most recent release process. Given the various challenges we ran in to, we decided it would be a good idea to have a retro, so we can identify successes and pain points, and see what we can to about improving things.

Feel free to edit this and add to the retro as you can. The retro will be held Dec 11, at 9:00am EST.

  • What should the team start doing?
    • When a chunk of release work is done, especially if checkboxes are checked on the branching/release community post, add some notes to the post. Add any opened PRs and blockers.
    • A container image with a standard operating environment for release work. Includes, for example, theforeman-rel-eng, tool_belt, etc. Sometimes delays are caused by tools breaking due to Fedora updates.
    • Investigate which processes in the Katello release could be safely scripted.
    • Give repo merge permissions to release people with a track record of supporting releases reliably.
    • Have an official ‘tester’ role for releases. Intent is to ensure we get feedback on the quality of the RCs, could also be a good way of getting qe involved, and getting people more familiar with releases.
  • What should the team stop doing?
    • One month before Katello GA, branch Pulpcore nightly and commit to that version. We should not start building a Pulp sooner than a month before GA.
  • What should the team continue doing?
  • Other comments/thoughts
    • Re: GitHub projects: could a number of templated cards be created for each release? It would remove the manual labor of creating cards every time. Perhaps it could be done via script with the API, or maybe a GitHub action somehow.

I’ll be out on Monday, so I’ll leave my comments here.

One pain point I noticed is that it’s hard to keep track.

First of all for others outside of the team. There was a delay and it was discussed in the meeting notes, but I was told it wasn’t entirely clear to everyone. It might be a good idea to start adding more comments to the branch/release process posts with the same information, but there can also be other solutions. (cc @Marek_Hulan & @aruzicka)

Then more inside the team. It’s hard to keep track of all the PRs and see what’s going on. For Ruby 3 I’ve created a GitHub project (Ruby 3 support · GitHub) and think it works fairly well. You do need to set the project on every PR/issue, but you can get cross project overviews this way. There may be other alternatives (post more comments on procedures), but I don’t have a strong preference how we exactly solve it.

We should also be more critical of our own procedures and optimize them where possible. Some steps can be simplified by writing some scripts, mostly in Katello. I’ve also been told there are many assumptions about my specific environment, which I didn’t notice. As a start I think we should file issues (Issues · theforeman/theforeman-rel-eng · GitHub) for them. Then try to address them, so we iterate on our procedures.

I won’t be there either, so here are some thoughts:
We ended up doing a lot of important work during low-availability times (people on PTO, on meetups, thanksgiving), which meant longer turn-around times and less concentration.

As I did not participate in the overall release procedure, I can not fully see/explain why this happened, but a few things stood out:

  • Pulpcore 3.39 was available rather late for testing and contained bugs we had to decide action on (ignore/block/etc).
  • Part of that delay was that we had packages on day X, but the related Katello PR wasn’t merged due to thanksgiving and nobody outside the US had the right permissions for a merge.

Based on that, I’d like to propose the following:

  • Pulp-related changes should land before stabilization, so it can be used to identify problems and fixes. (You could also read this as: Katello should have a stabilization period, matching the Foreman one)
  • Have people who can unblock things more globally. Not sure this should be in form of a “second release owner/engineer/etc” type allocation, or a more generic “we need people with org-admin permissions in other geos”

I think having someone who could have jumped in on a specific task plus a better “this needs to be done” overview (see what Ewoud has written) could have saved us quite a few days.

1 Like