Pulp 3 migration

Hello all,

I’ve been working on coming up with a plan for a migration to Pulp 3 in Katello and have been working with the pulp team to identify gaps that need to be filled.

You can find a list of those initial integration gaps here: Issues - Pulp

The current state of the pulp beta:

  • supports the following content types:
    • file
    • ansible (unused currently by katello)
    • python (unused currently by katello)
  • Currently has no rpm packaging
  • introduces many new concepts, I would recommend familiarizing yourself with new terms https://docs.pulpproject.org/en/3.0/nightly/overview/index.html
  • Is postgresql based

After some discussion, we arrived at a preferred migration strategy:
. * Require running pulp 2 and pulp 3 together for some period of time (2-3 releases potentially)

  • Start with a migration of the file content type and then migrate further types as they are available
  • once all types are migrated, drop pulp 2

Pre-requisites:

Before Pulp 3 can be utilized for any content types, we must:

  • Migrate sync plans to dynflow (in progress)

Before Pulp 2 can be fully removed we must:

  • Ensure there is a replacement for goferd-based actions

The general migration path would look like:

Technical Details can be found here: Pulp 3 Migration Technical Details - Google Docs

I would like to keep general ‘migration’ discussion here, but any technical discussion should go in the google doc.

Justin

3 Likes

I would find this pattern of slowly pulling pulp-3 in to be very valuable.

Just want to mention that one of my main concerns is to make sure that we all understand that Pulp 3 is Python 3 only compatible while Pulp 2 relies on Python 2. I believe there are dependencies on Django too for Pulp 3? So making sure that the installer “knows” where to pull each version of Python will be very important.

Also, with the phased implementation, will there be 2 different configurations for APIs that call Pulp 2 x Pulp 3? If so, what is the plan for current users upgrading to a new version of Katello and then for when dropping Pulp 2 in the future? I’m talking about configuration files and database migrations.

Og_Maciel https://community.theforeman.org/u/og_maciel
July 20

Just want to mention that one of my main concerns is to make sure that
we all understand that Pulp 3 is Python 3 only compatible while Pulp 2
relies on Python 2. I believe there are dependencies on Django too for
Pulp 3? So making sure that the installer “knows” where to pull each
version of Python will be very important.

Yes, depending on our deployment we will either use the python3 scl or
some containerization strategy for pulp 3.

Also, with the phased implementation, will there be 2 different
configurations for APIs that call Pulp 2 x Pulp 3? If so, what is the
plan for current users upgrading to a new version of Katello and then
for when dropping Pulp 2 in the future? I’m talking about
configuration files and database migrations.

Could you explain a bit what you mean here? the installer (in some way
or another) would write the pulp 2 and pulp 3 configuration files and
handle any database or content migrations.

Justin

Hi,

Thanks Justin for the update. Let me put my 2 cents in.

Unfortunately, I don’t really get the point why we should now change to pulp 3. It looks like, no new type of pulp 3 must be supported by katello asap. So, there is no pressure.

Pulp 3 has some really interesting concepts like repository versions and direct lifecycle management (see https://docs.pulpproject.org/en/3.0/nightly/overview/from-pulp-2.html). This would move some katello features back to pulp - which I really like.

Therefore, I would recommend to work together with the pulp team to get the missing content types done and then move from pulp 2 to pulp 3 in one step. This should be a easier migration than doing this step multiple times. Additionally, you can drop a lot of katello functionality and rely on pulp 3. I guess, having both pulp (2 and 3) running together might be need some bad workarounds / hacks which are error prune.

Another possibility would be, to that pulp 2 and pulp 3 are installed side by side and there is a foreman-installer option to switch between pulp 2 and pulp 3. In this case, only the content types can be used, which exists on the used pulp version then.

Regarding replacement of gofer / katello-agent, I guess remote execution with SSH is good enough. Right?

Best regards,
Bernhard

[Bernhard_Suttner] Bernhard_Suttner
https://community.theforeman.org/u/bernhard_suttner Katello
July 22

Hi,

Thanks Justin for the update. Let me put my 2 cents in.

Hey Bernhard, Thanks for the reply! Let me try to address some of your
concerns.

Unfortunately, I don’t really get the point why we should now change
to pulp 3. It looks like, no new type of pulp 3 must be supported by
katello asap. So, there is no pressure.

Pulp 3 has some really interesting concepts like repository versions
and direct lifecycle management (see
https://docs.pulpproject.org/en/3.0/nightly/overview/from-pulp-2.html).
This would move some katello features back to pulp - which I really like.

Yes, the plan is to utilize some of these features under the hood. (for
example a repository version will be used in our Content View
Versions). We plan on exposing the versions as well on the main library
repository too. We will use their distributions as our lifecycle
management. Are there other ways you’d want to utilize these
features? I’m very interested in any new ideas :slight_smile:

Therefore, I would recommend to work together with the pulp team to
get the missing content types done and then move from pulp 2 to pulp 3
in one step. This should be a easier migration than doing this step
multiple times. Additionally, you can drop a lot of katello
functionality and rely on pulp 3. I guess, having both pulp (2 and 3)
running together might be need some bad workarounds / hacks which are
error prune.

The the reason for the prolonged migration is largely due to software
development processes. When we did the pulp 1 to pulp 2 migration we
tried to do it all in one large migration. This involved having a
branch of code opened with a large amount of re-factoring for many
months. Rebasing became hard since other work continued. It was not
fun. This migration plan was done intentionally to be incremental, so
that we avoid the pains of the past. I don’t think that having both
pulp 2 and pulp 3 running at the same time will require anything close
to hacks. If you look through the technical document, i’ve tried to
write up a ‘guide’ for how to accomplish all this. The goal is to
completely avoid a large amount of if pulp2 else pulp3 statements
littered through the code. They should be very isolated and rare and
obvious.

I imagine if you look at the code base now, you won’t see how this is
all possible, but we plan on refactoring the code base first and make it
more manageable for a change of this magnitude. These refactorings I
think are worthwhile regardless to improve the overall quality.

Another possibility would be, to that pulp 2 and pulp 3 are installed
side by side and there is a foreman-installer option to switch between
pulp 2 and pulp 3. In this case, only the content types can be used,
which exists on the used pulp version then.

I’m not sure this is all that viable. What if a user is using two
content types and wants to upgrade? They would have to live some subset
of content to be disabled until some future version?

Regarding replacement of gofer / katello-agent, I guess remote
execution with SSH is good enough. Right?

Some users are not able to use SSH due to the nature of it. The goal is
to provide some alternative REX provider that originates communication
from the clients instead of from the server. We are still somewhat
exploring this to see how much its actually needed, but we have many
users that have reported being unable to use ssh-based REX.

Justin

Does this mean it has no yum support yet?

You just replied to my question in regards to new installations (configuration files will be automatically created) but I’m looking for the plan when we decide to drop Pulp 2 completely. I assume that the installer will also handle configuration files (removing Pulp 2’s and updating Pulp 3’s?) and migration of schemas. Though this may be an obvious path, is it clearly stated in the plans for this initiative?

By the way, you didn’t reply to my other question: “will there be 2 different configurations for APIs that call Pulp 2 x Pulp 3?”

lzap https://community.theforeman.org/u/lzap Discovery
July 23

Justin_Sherrill:

supports the following content types
…
Currently has no rpm packagi

Does this mean it has no yum support yet?

It doesn’t have yum support yet, but that was actually referring to how
you install pulp 3. It currently can only be installed via pip, but
eventually there should be rpms available.

[Og_Maciel] Og_Maciel https://community.theforeman.org/u/og_maciel
July 23

Justin_Sherrill:

Could you explain a bit what you mean here? the installer (in some way
or another) would write the pulp 2 and pulp 3 configuration files and
handle any database or content migrations.

You just replied to my question in regards to new installations
(configuration files will be automatically created) but I’m looking
for the plan when we decide to drop Pulp 2 completely. I assume that
the installer will also handle configuration files (removing Pulp 2’s
and updating Pulp 3’s?) and migration of schemas. Though this may be
an obvious path, is it clearly stated in the plans for this initiative?

I’m somewhat hesitant to spell this out because its still undecided if
we are to go down the containerization route or not. The deployment is
still up in the air, and so I don’t want to put too much detail in it at
this point.

By the way, you didn’t reply to my other question: “will there be 2
different configurations for APIs that call Pulp 2 x Pulp 3?”

By configurations, do you mean ‘will katello know how to talk to both
apis’? If so, yes :slight_smile:

I’m still of the opinion that the installer shouldn’t care about managing every resource. Ideally Pulp is just an API endpoint. Whether we deploy to containers, install it on a separate host or on the same host, the installer should only care about configuring katello to know where the API lives.

This is how we’ve always treated services in the vanilla Foreman installer. Due to historical reasons this hasn’t been possible. Pulp 3 is a chance to revisit this coupling.

This throws me a bit because something has to manage those resources. And if not the installer, then I don’t know what it would have been. A separate tool that the installer would have called inevitably? We are likely getting into more weeds here than Justin intends for this part of the discussion to be.

Eh, i’m okay with it, but maybe if it should be split out into another thread since it could be quite the discussion.

But yes, echoing what eric is saying. If you want to support the idea of installing katello and pointing to an existing pulp, I think that is a good goal. But i don’t think that should be a requirement, as I think most users would want a push button deployment.

I’d model it like external database support. We have a manage_db flag which pulls in the database installation, otherwise Foreman is configured with only the database URL. We’d probably still default to true but there’s a choice to run on a separate system.

If we really go off-topic: how feasible would it be to share a Pulp instance with multiple Foremans? One use case could be a testing setup where you always connect to an existing Pulp to speed up installation time.

Given Pulp has no way to isolate via something like a namespace this would be difficult to ensure functionality works across multiple Foremans. So if two Foreman’s had “Default Organization” things could get weird on the Pulp server.

Is there some good read about motivation on this change? I can only assume that MongoDB was pain to maintain for Katello users, but this is a huge change and I am interested if there were more reasons. Performance, scaling, stability, reliability. There must be a huge blogpost about this, but I was only able to find:

https://www.redhat.com/archives/pulp-list/2016-May/msg00042.html

I can read something about introducing ElasticSearch back to the stack, is this something that is really happening? What do Pulp need fulltext search for actually? Package names don’t need this, so probably descriptions and errata texts? Hopefully the search engine can be replaced out of box, I think ES might be overkill.

A motivation for us to move to pulp 3? or a motivation for the pulp team to work on pulp 3?

I would say largely the answer to both is the database. Mongo (surprise surprise) did not scale well when used with heavily relational data which pulp used a lot of. It required ‘joins’ to be done in memory and also made our deployment more complicated.

As part of ‘porting’ to a relational database, they took the opportunity to rethink a lot of aspects of pulp and thus the end result is quite different.

That was a discussion that was going on back then but really didn’t go anywhere. There is no elastic search in pulp 3. Pulp 3 consists of a django application, postgresql (or other relation db), RQ (a queuing system) and redis (the backend of the queuing system).

2 Likes