RFC: Pulp 4 preperation - migration from hrefs to PRNs

RFC: Pulp 4 preperation - migration from hrefs to PRNs

Context and Problem Statement

Pulp 4 is somewhere on the horizon, and hrefs are being replaced by PRNs (Pulp Resource Names). In Katello, we store references to Pulp hrefs for all Pulp entities that need keeping track of. For example, repository version hrefs are stored on repository records in Katello so we know which Pulp repository matches to which Katello repository.

We need to stop using hrefs and instead use PRNs.

Proposal

I’m proposing to perform the PRN migration within a single release. Originally I thought it would need to happen across multiple releases due to how slow it would be to look up each Pulp entity via the API. However, the PRN values can be computed.

Example href: /pulp/api/v3/repositories/rpm/rpm/0198a4fd-deac-75fe-8942-a8fbc8476481/
Matching PRN: prn:rpm.rpmrepository:0198a4fd-deac-75fe-8942-a8fbc8476481

The UUID values will always match between the href and the PRN. Each href prefix also matches up nicely to a PRN prefix. With this information, we can use a static prefix mapping from href to PRN to compute the values.

The only special case is repository versions - they look like /pulp/api/v3/repositories/rpm/rpm/0198a4fd-deac-75fe-8942-a8fbc8476481/versions/7/. The issue is that the version href only include the repository UUID, not the version UUID. So, for repository versions only, we’ll need to look the PRN up via the API (or, if we’re really needing speed, via a direct connection to the Pulp DB).

The steps to develop the migration would thus be:

  1. Begin indexing PRNs on Katello records with Pulp hrefs
  2. Populate PRNs for existing records
  3. Remove href fields and begin using PRNs for all Pulp entities

Testing:

  1. Measure the performance of the PRN migration on older hardware to ensure it truly is fast enough for a single upgrade
  2. Update Robottelo tests to stop relying on hrefs

Once Pulp 4 is out, Katello will be already using PRNs, so there will be no concern about href fields no longer being available.

Alternative Designs

Looking up all PRN values by the API is an alternative as mentioned before. It is likely much slower since “computing” the PRN will just require looking up hrefs in the Katello DB, performing string manipulation following the mapping, and inserting the new PRN values. If we fetch the values via the Pulp API, we have to go through Pulp’s entire stack. The only benefit is that it would be simpler logic. However, a lookup table for href->PRN is relatively simple as well, and even if the resource names change, we only need to maintain the mapping for a single release.

Decision Outcome

With no suggestions otherwise, we shall continue with the current plan of migrating to using PRNs in a single release, which will likely be Katello 4.19. This is dependent on how performant the migration ends up being.

Impacts

There should be minimal impact on the upgrade - we’re hoping to keep it to under a half hour for this migration.

2 Likes

How will you implement the migration? As a DB migration that performs it or as a migration that adds an additional column with a separate task to populate it? The separate task has the downside that the Pulp code needs to handle both code paths, but can result in less downtime. Then you’ll also need a migration in a later version to drop the old column, with some safeguard that the prn is populated.

1 Like

We have a PR out currently with the latest implementation work: Fixes #38751 - Populate PRN fields for DB records in a migration by sjha4 · Pull Request #11503 · Katello/katello · GitHub

The general implementation strategy for upgrade is to:

  1. rubygem-katello is updated on the user’s system, which means our code now completely relies on PRNs (no installer run yet)
  2. Installer runs, triggering: PRN field creation & the new PRN calculated migration runs as a DB migration.
    a. Repository version PRNs are left empty (and it works nicely that we don’t have existing NULL / uniqueness constraints on repository version fields).
  3. Populate repository version PRNs via an (automated) post upgrade task that fetches data from the Pulp API.

Indeed, we’ll need to clear out the href fields in a later release.

Which of these upgrade steps is most likely to introduce an extended time to upgrades? And do you have any preliminary data on what that will look like time wise?

1 Like

The repo version href fetching part of the migration is the only question mark. The “calculation” part of the migration was reportedly incredibly fast, I think @sajha mentioned testing with a million records of some sort.

For the repo version hrefs, we need to do some performance testing with, say, 10,000 repository versions. Unless a user never cleans up any CVs and publishes them every day (with filters), I wouldn’t expect repo version counts to be more than the order of 10s of thousands (even that I would find surprising).

We’ll get the timing numbers before its committed to Katello 4.19 and make a call from there about if we need to move any part of the upgrade to the background.

We just merged the version_href migration. On @sajha 's dev box (which seemed quite slow), it took only 3 minutes to migrate 5187 repositories with a page size of 2 requested from the Pulp API. For a page size of 2000, it took 12 seconds. Fixes #38778 - Populate PRN columns for Repository versions by sjha4 · Pull Request #11511 · Katello/katello · GitHub

This tells me that performance should be no worry for the migration to be a part of the installation process.

1 Like

Update on this effort. Our code for indexing + migrating to include PRNs in our DB is nearly complete.

We learned today that Pulp is likely not going to completely get rid of hrefs - as implementation started, they realized the switch may be a bad idea.

As such, this work was technically not necessary, however there is still reason to keep the PRNs around:

  1. The PRN data is in the API responses we request from Pulp already, so saving the record is cheap.
  2. PRNs could become more important in the future, so we’ll be prepared for changes.
  3. PRNs are smaller than hrefs, so using them in features like content view copying means we could use higher batch sizes and thus increase performance.

Once Pulp 4 does come around, we’ll look into switching from using hrefs to using PRNs where it makes sense.