@Chris_Roberts, @ehelms and I are working on improving the content import/export process. We’d like to describe our plans in order to get feedback.
How does it work today?
In Katello 3.7, you can use hammer to export a specific yum repository, or all of the yum repositories in a content view. This is documented in the Katello manual.
This gets the content from point A to point B, but we received user feedback that the content views and CV versions were important to keep. Right now, you have to redefine the CV by hand, and hope that when you publish that the CV version has the content you expect.
Proposed improvement
We are planning on doing two things: enhance APIs to make it easier to redefine content views and versions, and add automation to make the export/import process less painful.
If you wanted to redefine a content view version today, it is difficult. You would have to perform the existing steps to get the content into Library, then set up a content view on the importing Katello by hand that has include filters which match what content you expect to exist, and then publish and confirm the CV version was created with the correct units in the repository.
As part of this effort, we will allow publishing of a content view with a specific set units from Library, which will override any filter definitions. The unit list would be defined via whatever the unique identifier exists for the unit (docker image manifest hash, erratum ID, RPM NEVRA, etc). The newly created CV version can then be promoted to the correct lifecycle environment.
For example, a CV version publish might receive JSON that looks like this:
{
"repos": [
{ "name": "zoo repo", "type": "yum", "rpm_filenames": ["bear-4.1-1.noarch.rpm", "zebra-0.1-2.noarch.rpm"], "errata_ids": ["RHEA-2012:0002", "RHEA-2012:0004"] }
]
}
This would mean "create a new CV version, and take the errata_ids
and rpm_filenames
in zoo repo
in Library to use those in the CV version’s zoo repo
.
We will also allow for setting the CV version number during publish. The version number will need to be in today’s X.Y
format to not break auto-increment or other areas that expect the version in a specific format.
We will also be contributing to the Foreman Ansible modules repository to add automation around content view import/export. We decided to do it this way in order to keep the process as a set of discrete steps that can be recomposed if users need to alter the import/export process in a way that’s specific to their needs.
The overall workflow will look something like this:
EXPORT (done via Ansible)
-
Katello health check to ensure everything is operational
-
Obtain CV version metadata (name + version, list of repos, list of units in each repo, etc)
-
Perform CV export
-
Put data together into tarball
IMPORT
-
Katello health check to ensure everything is operational
-
untar tarball
-
create/enable products if they don’t already exist
-
create/enable repos on products if they don’t already exist
-
sync enabled repos in Library, using sync override URL to point to exported repos from tarball
-
create content views and attach repos if they don’t already exist
-
publish content view using version info list of units from export
At this point, the content view version can be promoted to the correct lifecycle environment.
Next steps
-
Put in pull requests that allow setting version string and unit list for each repo during CV publish. The first cut will only be for RPM NEVRAs and erratum IDs, but future updates will support additional content types.
-
add support to content view Ansible module for pulling and setting needed data on content views, and to support repo export
-
add support to repo Ansible module to override sync URL and allow sync
-
create Ansible scripts that use new modules, one for export and another for import. The first version will be for a totally disconnected setup, but we may add in support for scenarios where the importing Katello is able to talk to the exporting Katello.
At the end of the next steps, we should have something demoable and can then iterate on improvements.
Detailed Content View Import/Export Workflow
This guide is meant to be a sketch of how the import/export process will work. Details may change during implementation.
On the Exporting Katello
Definitions + Daily operations
The user defines their products, repos and CVs with Ansible. We can provide a template but the user would be responsible for maintaining a cv-definitions.yml
that would be invoked to add or update their definitions:
ansible-playbook playbooks/cv-definitions.yml
After the definitions are created, the user syncs and publishes periodically. This can be done via Ansible, hammer, or the web UI. We recommend not performing syncing or content view publishing as part of the cv-definitions.yml
since we’ll be re-using that file later.
Exporting
This would be done via export.yml
. It would look something like:
ansible-playbook playbooks/export.yml --extra-vars "content_view_version=5 include_repo_contents=true"
NOTE: On a production installation you would likely use a vars file that’s managed with source control instead of passing in extra_vars
. The extra_vars
here is just added to make the example clearer, and is not considered a best practice.
We would fetch the list of repositories for the content view version, each repo’s relative_path
, and list of rpm filenames and erratum IDs. If include_repo_contents
is set, we would then make a big tarball of all the relative_paths
from /var/lib/pulp/published
, making sure to dereference symlinks.
Example format of tarball
-
export.tar
-
repos/
<relative_path>/repo_one
<relative_path>/repo_two
-
export.json
(format of JSON defined below)
-
At the end of the process, we have a few pieces of info:
- the content view name for the CV version
- the content view version’s version number (example:
45.0
, or we can split it intomajor
andminor
) - the list of repos in the content view, along with the list of units in each repo
- a
relative_path
andfull_path
. These will be used for disconnected and connected sync (connected sync is not supported in the initial version, but can be added later) - a debug certificate and CA certificate to be used for the connected sync (again, not needed for first iteration)
- a big tarball of the repo directories with dereferenced symlinks, assuming
include_repo_contents
was set
The list should be in the following format in order to be compatible with the import step. This example is for the zoo repo
with a filter that only pulls in bear
and zebra
packages (NB: we don’t capture the filters right now, I mention them only because it explains why there’s only two units in the repo) . The errata_ids
are extraneous for this particular example but I added them in for clarity.
This JSON would be generated by export.yml
, and is used as the input for later steps. It is not generated directly from Katello, and cannot be fed directly into Katello.
{
"ca_cert": "<large PEM>",
"content_view_name": "Animals CV",
"content_view_version_major": "45",
"content_view_version_minor": "0",
"debug_cert": "<large PEM>",
"repos": [
{
"errata_ids": [
"RHSA1",
"RHBA2"
],
"full_path": "http://exporting-katello.com/pulp/repos/Default_Organization/content_views/Animals_Content_View/2.0/custom/Zoo_Product/Zoo_Repository/",
"name": "zoo repo",
"relative_path": "Default_Organization/content_views/Animals_Content_View/2.0/custom/Zoo_Product/Zoo_Repository",
"rpm_filenames": [
"bear-4.1-1.noarch.rpm",
"zebra-0.1-2.noarch.rpm"
],
"type": "yum"
}
]
}
On the Importing Katello
Definitions
The user can re-run their cv-definitions.yml
to populate the importing Katello. The importing Katello is typically not able to sync from the Red Hat CDN or other sources directly, this is handled in the next steps.
Importing
Getting the content onto the importing Katello
This step would be done via a newly defined playbook called cv-import-load.yml
. It would be invoked like this:
ansible-playbook playbooks/cv-import-load.yml --extra-vars "import_definitions=export-from-other-katello.json import_repo_contents_from_tgz=true"
When invoked, this would find the Library version of each repo in the content view and sync. There are two ways the sync can work, depending on the value of import_repo_contents_from_tgz
. If it’s set to true
, then the override source_url
is set to file:///path/to/tgz/contents/relative_path
. If it’s set to false
, then source_url
is set to full_path
. Note that we may need to add additional overrides to the sync API to add the debug certificate and CA certificate. Connected import is an optimization that will not be supported in the initial version, but can be added later.
Redefining the Content View version
After cv-import-load.yml
is complete, all of the data is on the importing Katello and we just need to redefine the content view version. To do that, we will use cv-recreate-version.yml
:
ansible-playbook playbooks/cv-recreate-version.yml --extra-vars "import_definitions=export.json"
This will publish a content view version with the parameters given in export.json
. The recreated version will have the same content view version number, and each repo will have the same contents that were defined via the export JSON.
That’s it! We will be adding support for other content types and unit types in the future, but the workflow will be the same. Any content type that can be uniquely identified as the same unit on two servers and be copied in Pulp can be supported.
Breakdown of each proposed Ansible script
This section gives a summary of the steps done by each proposed Ansible script. Again, this is just a sketch and not a hard and fast requirement.
cv-definitions.yml
This file is only responsible for creating or enabling products, repos, and content views. It will be run as the first step on both the exporting and importing Katello servers.
The user is responsible for syncing and publishing. They can use another Ansible script, a scheduled sync, hammer, or the web UI. Note that the download_policy
is immediate
. We need the units downloaded onto the Katello server in order to export them later.
I think all of the features for this are already available in foreman-ansible-modules
.
- hosts: localhost
become: true
tasks:
- block:
- name: 'zoo product'
katello_product:
username: "admin"
password: "changeme"
server_url: "https://localhost/"
organization: "Default Organization"
name: "Zoo Product"
verify_ssl: false
- name: "Create zoo repository"
katello_repository:
username: "admin"
password: "changeme"
server_url: "https://localhost/"
name: "Zoo Repository"
state: present
content_type: "yum"
product: "Zoo Product"
organization: "Default Organization"
url: "https://repos.fedorapeople.org/repos/pulp/pulp/demo_repos/zoo/"
download_policy: immediate
verify_ssl: false
- name: "Create Animals CV"
katello_content_view:
username: "admin"
password: "changeme"
server_url: "https://localhost/"
name: "Animals Content View"
organization: "Default Organization"
repositories:
- name: 'Zoo Repository'
product: 'Zoo Product'
verify_ssl: false
export.yml
This script is responsible for assembling the export tarball. This includes creating the export.json
file, and either using the Katello CV export feature, or simply including all of the repos by their relative_path
into the tarball. The latter may be more expedient, but requires verifying that all units are indeed available on-disk (i.e., no broken symlinks in /var/lib/pulp/published/<relative_path>
). We will also need to ensure all units are on-disk at this point by checking that all symlinks point to files. This is preferred vs checking for immediate
repo disposition since it’s possible to set a repo to on_demand
, flip it to immediate
, but not sync it and still have broken symlinks.
This will require a few new features in foreman-ansible-modules
. We will probably need to create a new katello_content_view_export
module uses the nailgun library to create export.json
. The tarball should not be too difficult to make since we’ll already have relative_path
in hand. It is OK to ignore the CA and debug certs for the first revision.
cv-import-load.yml
This script is meant to be run after cv-definitions.yml
runs on the importing Katello server. The job of this script is untar the tarball, find each repository that exists in the CV (as read from export.json
, and then run a sync on that repo. The sync will need to have source_url
defined so it can point to a file:///
URLs. This is not possible today with the katello-repository
module and will need to be added in to the existing module.
Additionally, we’ll likely need a katello_content_view_import
Ansible module that can crack open the tarball and read the contents.
cv-recreate-version.yml
This script redefines the content view versions. It makes use of two patches: one to set the major/minor version on a CV version when publishing, and another to set a list of units on each repo when publishing. These patches are both still being worked on. After they are merged, nailgun will need to be made aware of these APIs, and then the katello_content_view_publish
module will need to take advantage of these. cv_recreate-version.yml
will also be reading export.json
, similar to cv-import-load.yml
.