RFC: handling upcoming pulp-container lengthy migration

Hello community,

Katello 4.13 is going to include a version of pulp-container (likely 2.20) that includes Distinguish between the nature of images by lubosmj · Pull Request #1532 · pulp/pulp_container · GitHub.

This change exposes information about container manifests so we can tell them apart. Specifically, labels are exposed, along with nicer boolean fields on the pulp-container API to tell if the container image is bootable or if it holds flatpak content. Eventually this information will be shown in the Katello UI as part of renovating our container content support.

To do this, (as I understand) pulp-container needs to delve into each synced manifest, find that info, and get it into their database. This could’ve mean a lengthy amount of downtime, but the Pulp folks included a new django-admin command to pre-migrate this content. The feature will work for newly synced content out of the box, but older content will need to be migrated.

Eventually pulp-container will force this migration to run during an upgrade, but it won’t happen in Katello until users have had at least one release to do the pre-migration.

To make this smooth and clear, I’m proposing the following:

  1. Create a rake script as a shell for the django-admin command so that users can pre-migrate their Pulp container data.
  2. Make the new feature clear in our release notes.
  3. Add upgrade documentation explaining how to start the pulp-container pre-migration.
  4. Add upgrade documentation explaining why the pulp-container pre-migration is necessary.
  5. Add upgrade documentation explaining who needs to run the pulp-container pre-migration.
    → If you don’t have a large amount of container content, or if you don’t mind > 30 mins of downtime, there’s no reason to run the pre-migration.
  6. Add rough timing information about the pre-migration to the docs above.
  7. Add information about how the pre-migration might affect the performance of the running Foreman server to the docs above.

The goal here is to make the database migration as smooth and as fast as possible while providing a solution with as little downtime during a future upgrade as possible.

I’m curious to hear if anyone has any ideas about this, or if there are any concerns about the strategy.

3 Likes

Smart proxies with content will also need this pre-migration run on them with the current solution. I don’t think we will ever use these label fields in Pulp on smart proxies, however.

@lubosmj it wouldn’t be possible / reasonable to make this migration forever optional, would it? The new fields seem purely informational.

The rake script won’t work on smart proxies unless it’s possible to remotely trigger the migration via Pulp API.

If that inconsistency must exist, then perhaps we shouldn’t ship a rake task at all.

To do this, (as I understand) pulp-container needs to delve into each synced manifest, find that info, and get it into their database.

Yes, we are going to dig up every single manifest and config blob available.

@lubosmj it wouldn’t be possible / reasonable to make this migration forever optional, would it? The new fields seem purely informational.

Correct, it is not sustainable and practical to support this “pre-migration” over an extended period. The team reached a consensus on supporting 2-3 pulp-container Y-releases with the said “pre-migration”.

The plan is that we will ship additional django-admin commands besides the one for Telling Image Nature apart (i.e., one for Artifacless Manifests, one for Artifacless Config Blobs) throughout multiple releases in pulp-container (even those which will not land in the current Katello build). We are now considering to maintain two functionally different code paths to support both migrated and not yet migrated content. At the end of the day, we would want to kindly ask our users to run the real DB migration (populating all newly accumulated DB fields) if they could not run the optional django-admin commands before.

1 Like

I realized that, instead of a rake task, we can ship the pre-migration in foreman-maintain. That will work on both smart proxies and Foreman.

Since most upstream users don’t use foreman-maintain, will this cause problems?

I think if we take the sub-set of Katello users it is likely a higher number is also using foreman-maintain.

Also if I look at how the pulp 2 to 3 migration is documented it used a mix of both foreman-rake and foreman-maintain.

So I think both is valid, but I would prefer having all commands this time using the same base command if possible.

2 Likes

I’m aiming for this to be the case by pushing for Katello 4.13 to include all of the upcoming pulp-container migrations. That way we can have one migration command in foreman-maintain.

This migration is also much simpler than the big Pulp 2 to 3 migration one since Katello won’t need to do anything in its own database.

Another tip: this migration won’t be a big deal for users who don’t have a lot of container content. They could even completely ignore the command and just let the migration run when Pulp forces us to completely migrate. The command is just around for big container users to avoid downtime.

While Foreman users don’t use foreman-maintain to upgrade like Satellite users do, they do use foreman-maintain for other reasons like backup & restore, service restarting, health checks, etc. Plus they needed it for the migration like Dirk mentioned, so hopefully they all have it in their environments by now :slight_smile:

2 Likes

It sounds like the Pulp artifactless config blobs migration might need to make it into Katello 4.14 instead – if that’s the case, we’ll try to make sure that it’s included in the same foreman-maintain command. Then users would just need to run it again after the 4.14 upgrade. If they did it already during 4.13, then it’ll just run the last migration. If they forgot to do it, then it’ll run all migrations (without downtime).

Hopefully this sounds like a convenient balance between upgrade complexity and freedom to decide when to run the migration. Let us know otherwise :slight_smile:

1 Like

We just did some further designing around this – for Katello to make use of the labels, eventually the old content will need to be indexed into our database. We’re thinking currently that the foreman-maintain command will also reindex the manifest label information from Pulp into Katello.

Do current containers have label information in them that we need to show?

If we didn’t migrate all existing containers would new containers or new
syncs grab the info?

Newly synced container manifests will get the label information as of Katello 4.13. Old containers simply won’t show the label information until the Pulp migration is run.