[katello] design document for inter-server sync

Katello devs,

A design document for inter-server sync is available at
InterServerSync - Katello - Foreman.

The short version is that we would like to implement a way to sync data
between katello instances that will solve the same problem that's solved
by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use
this new mechanism to replace functionality previously provided by
katello-disconnected, so there is no need for a separate script.

If you have any questions or would like additional detail, feel free to
email or ask via IRC. Thanks!

Before I comment, are you planning a deep dive on the design? I will save
questions for that if you are, otherwise I'll reply here after reading
through the design.

Eric

··· On Thu, Oct 29, 2015 at 3:35 PM, Chris Duryee wrote:

Katello devs,

A design document for inter-server sync is available at
InterServerSync - Katello - Foreman.

The short version is that we would like to implement a way to sync data
between katello instances that will solve the same problem that’s solved
by Spacewalk’s Inter-Spacewalk Sync. Additionally, we would like to use
this new mechanism to replace functionality previously provided by
katello-disconnected, so there is no need for a separate script.

If you have any questions or would like additional detail, feel free to
email or ask via IRC. Thanks!


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Eric D. Helms
Red Hat Engineering
Ph.D. Student - North Carolina State University

A couple of things here.

  • I think the long term goal should be to export an entire org, the
    hostgroups, content views, templates, scap policies, etc. This design
    reads like content only. I think this is fine for now, but I wonder if
    you should hold off on content views until a broader data transport
    design is made.

  • It also seems like the notion of live connect (think manifest
    chaining) is not in the cards. Is that correct?

Looking at The phase 1 Hammer Design

A) The pattern seems to be to pull out repos and products via the CLI.
HOw much harder would it be to specify a root ENV and Content View to
scope the calls? I ask because that would make it much more powerful in
phase 1.

B) What is the output format of hammer repository export --id <id>. If I
am air gapped, how am I walking it across the network boundaries?

C) With your step entitled (# replace URL with on-disk location for
place to sync from on destination katello), I assume this will work for
Red Hat Conent only? How do you envision a model where a disconnected
satelite gets Red Hat and some custom content from a different
satellite, and some is mirrored from other internal repos?

D) Why the limitation on rpm only content?

– bk

··· On 10/29/2015 03:35 PM, Chris Duryee wrote: > Katello devs, > > A design document for inter-server sync is available at > http://projects.theforeman.org/projects/katello/wiki/InterServerSync. > > The short version is that we would like to implement a way to sync data > between katello instances that will solve the same problem that's solved > by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use > this new mechanism to replace functionality previously provided by > katello-disconnected, so there is no need for a separate script. > > If you have any questions or would like additional detail, feel free to > email or ask via IRC. Thanks! >

BK covered a few of my questions, but I do have some additional ones.

  1. If you are going to replace Phase 1 hammer work with the APIs to do this
    properly, why not start with the APIs (since hammer and web UI would use
    it)?

  2. Is UI support more of a priority for our users than "connected" support
    and other content types? I could see working through those features from an
    API stand-point helping to refine the overall workflow which would
    influence the UI.

  3. Have you considered raising this to the users-list for feedback? Or have
    existing Redmine issues as a backdrop? I think some groups have found
    surveys have worked better than throwing out a design for gathering
    feedback around certain decision points – just something to consider.

Eric

··· On Thu, Oct 29, 2015 at 3:35 PM, Chris Duryee wrote:

Katello devs,

A design document for inter-server sync is available at
InterServerSync - Katello - Foreman.

The short version is that we would like to implement a way to sync data
between katello instances that will solve the same problem that’s solved
by Spacewalk’s Inter-Spacewalk Sync. Additionally, we would like to use
this new mechanism to replace functionality previously provided by
katello-disconnected, so there is no need for a separate script.

If you have any questions or would like additional detail, feel free to
email or ask via IRC. Thanks!


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Eric D. Helms
Red Hat Engineering
Ph.D. Student - North Carolina State University

> Before I comment, are you planning a deep dive on the design? I will save
> questions for that if you are, otherwise I'll reply here after reading
> through the design.
>

Let's stick with email for the time being, but if there is interest in a
deep dive I'm happy to schedule one.

··· On 10/29/2015 03:40 PM, Eric D Helms wrote:

Eric

On Thu, Oct 29, 2015 at 3:35 PM, Chris Duryee cduryee@redhat.com wrote:

Katello devs,

A design document for inter-server sync is available at
InterServerSync - Katello - Foreman.

The short version is that we would like to implement a way to sync data
between katello instances that will solve the same problem that’s solved
by Spacewalk’s Inter-Spacewalk Sync. Additionally, we would like to use
this new mechanism to replace functionality previously provided by
katello-disconnected, so there is no need for a separate script.

If you have any questions or would like additional detail, feel free to
email or ask via IRC. Thanks!


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

> BK covered a few of my questions, but I do have some additional ones.
>
> 1) If you are going to replace Phase 1 hammer work with the APIs to do this
> properly, why not start with the APIs (since hammer and web UI would use
> it)?
>

API is definitely the first to come since, as you said, UI and CLI rely on it.

> 2) Is UI support more of a priority for our users than "connected" support
> and other content types? I could see working through those features from an
> API stand-point helping to refine the overall workflow which would
> influence the UI.

Absolutely agree. I think there is existing redesign work/wireframes around the subscriptions, repositories, and product pages. Any UI work we do I'd like to see as incremental work towards making those pages better in the context of that.

··· ----- Original Message -----
  1. Have you considered raising this to the users-list for feedback? Or have
    existing Redmine issues as a backdrop? I think some groups have found
    surveys have worked better than throwing out a design for gathering
    feedback around certain decision points – just something to consider.

Eric

On Thu, Oct 29, 2015 at 3:35 PM, Chris Duryee cduryee@redhat.com wrote:

Katello devs,

A design document for inter-server sync is available at
InterServerSync - Katello - Foreman.

The short version is that we would like to implement a way to sync data
between katello instances that will solve the same problem that’s solved
by Spacewalk’s Inter-Spacewalk Sync. Additionally, we would like to use
this new mechanism to replace functionality previously provided by
katello-disconnected, so there is no need for a separate script.

If you have any questions or would like additional detail, feel free to
email or ask via IRC. Thanks!


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Eric D. Helms
Red Hat Engineering
Ph.D. Student - North Carolina State University


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

> > Katello devs,
> >
> > A design document for inter-server sync is available at
> > InterServerSync - Katello - Foreman.
> >
> > The short version is that we would like to implement a way to sync data
> > between katello instances that will solve the same problem that's solved
> > by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use
> > this new mechanism to replace functionality previously provided by
> > katello-disconnected, so there is no need for a separate script.
> >
> > If you have any questions or would like additional detail, feel free to
> > email or ask via IRC. Thanks!
> >
>
> A couple of things here.
>
> * I think the long term goal should be to export an entire org, the
> hostgroups, content views, templates, scap policies, etc. This design
> reads like content only. I think this is fine for now, but I wonder if
> you should hold off on content views until a broader data transport
> design is made.

We are hoping that hammer-cli-csv can be used for this process. Although not in the requirements for phase one, exporting and then importing an entire organization (and in fact an entire server) is what we are aiming for long term.

>
> * It also seems like the notion of live connect (think manifest
> chaining) is not in the cards. Is that correct?

It's our thinking that the disconnected implementation will help us work through refinements of how best to accomplish things. Understanding all the moving parts will then let us think about live connect. Defining what "live" means is a task in and of itself.

>
> Looking at The phase 1 Hammer Design
>
> A) The pattern seems to be to pull out repos and products via the CLI.
> HOw much harder would it be to specify a root ENV and Content View to
> scope the calls? I ask because that would make it much more powerful in
> phase 1.

We are limiting this to Library/Default_Organization_View on both the export and import ends at the moment.

Part of this has to do with what exporting a content view means: Will it just be the latest version? Since CV filters are not versioned, what does this imply for content export that no longer matches the filters? etc. Again, this will probably fall under the "live" discussion.

It is possible now, using hammer-cli-csv, to export/import content view definitions (including filters). This would allow a user to manually (or via script) to reproduce CV content in another server but they would be responsible for making sure the same publish and promotes happened.

I don't see any reason we would prevent export of a specific CV (instead of Library) but I don't think we should give the impression that it is mirrored content (a la capsules).

>
> B) What is the output format of hammer repository export --id <id>. If I
> am air gapped, how am I walking it across the network boundaries?

Output will be iso or directory. Perhaps tar.gz too?

>
> C) With your step entitled (# replace URL with on-disk location for
> place to sync from on destination katello), I assume this will work for
> Red Hat Conent only? How do you envision a model where a disconnected
> satelite gets Red Hat and some custom content from a different
> satellite, and some is mirrored from other internal repos?

We are expecting to allow a per-product (or per-repo) sync url that can be overridden at sync time. This should be the same for custom or red hat products. The difference in red hat products will be that we'll restrict sync'ing to repos enabled via subscriptions.

>
> D) Why the limitation on rpm only content?

Limiting scope of initial work. We'll add the known issues and limitations associated with the other formats to the design doc.

··· ----- Original Message ----- > On 10/29/2015 03:35 PM, Chris Duryee wrote:

– bk


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

>> Katello devs,
>>
>> A design document for inter-server sync is available at
>> InterServerSync - Katello - Foreman.
>>
>> The short version is that we would like to implement a way to sync data
>> between katello instances that will solve the same problem that's solved
>> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use
>> this new mechanism to replace functionality previously provided by
>> katello-disconnected, so there is no need for a separate script.
>>
>> If you have any questions or would like additional detail, feel free to
>> email or ask via IRC. Thanks!
>>
>
> A couple of things here.
>
> * I think the long term goal should be to export an entire org, the
> hostgroups, content views, templates, scap policies, etc. This design
> reads like content only. I think this is fine for now, but I wonder if
> you should hold off on content views until a broader data transport
> design is made.

Agreed, being able to export 100% of the contents of an org is the long
term goal.

Just to give some more background on the data format, for offline use
(aka all use until phase 3) the data format will be CSV for data without
a binary component (basically using katello-cli-csv for Ph 1, and then
building off of that in future iterations), and exported tgz for repos
and such. This all gets wrapped up into a bigger tgz, possibly with an
additional bit of metadata. The reason for CSV instead of JSON is that
it's easier for someone to hand-inspect to see what's being loaded.

Some users may want to export in ISO format, but I think manual use of
mkisofs or genisoimage after the export is created should be ok.

The format for online xfer (phase 3) is not decided on yet. It may be a
dynflow task that does a bit of discovery via API calls or local calls
and then making a number of API calls to the other Katello after
creating a set of tasks for what needs to happen. I have some learning
to do in this area so the doc is a bit vague :slight_smile:

For Phase 3, I would like to ensure that APIs for manipulating data all
work for doing "inverse" operations (i.e., for a given call, the output
of a GET works as input for the POST or PUT on a different server). IMO,
having to massage the data after its pulled out adds brittleness. I
don't know how feasible this goal is but I would like to try. There's no
need for hand-inspection of data here, so we can just use JSON without
conversion to CSV.

I'll update the doc with this info today.

>
> * It also seems like the notion of live connect (think manifest
> chaining) is not in the cards. Is that correct?

The plan we have currently is that inter-server-sync does not deal with
manifests; the user would need to import them as they normally would.

>
> Looking at The phase 1 Hammer Design
>
> A) The pattern seems to be to pull out repos and products via the CLI.
> HOw much harder would it be to specify a root ENV and Content View to
> scope the calls? I ask because that would make it much more powerful in
> phase 1.

The main snag we hit with export/import of content views outside of the
default view in library is the following:

  • user exports CV1 at version 1 from Katello A
  • user imports CV1 at version 1 to Katello B
  • user updates CV1 (now at version 2) on Katello A
  • user updates CV1 (also now at version 2) on Katello B

At this point, Katello A and B have CV1 version 2, but they are
different. When exporting CV1 version 2 from A and importing to B, what
would be the best behavior?

One way could be to make imported CVs "locked" somehow so they could
only be updated via import. If someone wanted to modify it for local
use, they could create a CCV using the locked CV.

>
> B) What is the output format of hammer repository export --id <id>. If I
> am air gapped, how am I walking it across the network boundaries?

The export would be tgz. Users would need to scp it or have some other
means to obtain it locally before walking it over. We could add "export
and then download it locally" to hammer as a feature but it's currently
not on the doc.

>
> C) With your step entitled (# replace URL with on-disk location for
> place to sync from on destination katello), I assume this will work for
> Red Hat Conent only? How do you envision a model where a disconnected
> satelite gets Red Hat and some custom content from a different
> satellite, and some is mirrored from other internal repos?

I would like for the repo URL swap to go away ASAP :slight_smile: I believe it
should be constrained to just the products being exported though, so it
should not affect other repos.

Having said that though, we may want to investigate uploading content
into the destination instead of syncing from filesystem. This would let
the repo description (sync url and such) in Katello and Pulp on the
destination be 1:1 with the source. IMO, sync instead of upload would be
a Phase 3 optimization for online cases.

>
> D) Why the limitation on rpm only content?

Only because rpm + errata seems to be the most requested, and I would
rather get things nailed down with this before moving on to other types.

Some content types (docker, possibly ostree but I don't remember for
sure) have additional pieces of metadata that are used during sync but
not published. We would need to ensure that for any content type, a
publish/sync or export/import is lossless for data Katello and Pulp care
about.

··· On 10/30/2015 08:34 AM, Bryan Kearney wrote: > On 10/29/2015 03:35 PM, Chris Duryee wrote:

>
>
> >> Katello devs,
> >>
> >> A design document for inter-server sync is available at
> >> InterServerSync - Katello - Foreman.
> >>
> >> The short version is that we would like to implement a way to sync data
> >> between katello instances that will solve the same problem that's solved
> >> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use
> >> this new mechanism to replace functionality previously provided by
> >> katello-disconnected, so there is no need for a separate script.
> >>
> >> If you have any questions or would like additional detail, feel free to
> >> email or ask via IRC. Thanks!
> >>
> >
> > A couple of things here.
> >
> > * I think the long term goal should be to export an entire org, the
> > hostgroups, content views, templates, scap policies, etc. This design
> > reads like content only. I think this is fine for now, but I wonder if
> > you should hold off on content views until a broader data transport
> > design is made.
>
> Agreed, being able to export 100% of the contents of an org is the long
> term goal.
>
> Just to give some more background on the data format, for offline use
> (aka all use until phase 3) the data format will be CSV for data without
> a binary component (basically using katello-cli-csv for Ph 1, and then
> building off of that in future iterations), and exported tgz for repos
> and such. This all gets wrapped up into a bigger tgz, possibly with an
> additional bit of metadata. The reason for CSV instead of JSON is that
> it's easier for someone to hand-inspect to see what's being loaded.
>
> Some users may want to export in ISO format, but I think manual use of
> mkisofs or genisoimage after the export is created should be ok.
>
> The format for online xfer (phase 3) is not decided on yet. It may be a
> dynflow task that does a bit of discovery via API calls or local calls
> and then making a number of API calls to the other Katello after
> creating a set of tasks for what needs to happen. I have some learning
> to do in this area so the doc is a bit vague :slight_smile:
>
> For Phase 3, I would like to ensure that APIs for manipulating data all
> work for doing "inverse" operations (i.e., for a given call, the output
> of a GET works as input for the POST or PUT on a different server). IMO,
> having to massage the data after its pulled out adds brittleness. I
> don't know how feasible this goal is but I would like to try. There's no
> need for hand-inspection of data here, so we can just use JSON without
> conversion to CSV.

Yeah, live connection is an entirely different design discussion outside the initial scope.

··· ----- Original Message ----- > On 10/30/2015 08:34 AM, Bryan Kearney wrote: > > On 10/29/2015 03:35 PM, Chris Duryee wrote:

I’ll update the doc with this info today.

  • It also seems like the notion of live connect (think manifest
    chaining) is not in the cards. Is that correct?

The plan we have currently is that inter-server-sync does not deal with
manifests; the user would need to import them as they normally would.

Looking at The phase 1 Hammer Design

A) The pattern seems to be to pull out repos and products via the CLI.
HOw much harder would it be to specify a root ENV and Content View to
scope the calls? I ask because that would make it much more powerful in
phase 1.

The main snag we hit with export/import of content views outside of the
default view in library is the following:

  • user exports CV1 at version 1 from Katello A
  • user imports CV1 at version 1 to Katello B
  • user updates CV1 (now at version 2) on Katello A
  • user updates CV1 (also now at version 2) on Katello B

At this point, Katello A and B have CV1 version 2, but they are
different. When exporting CV1 version 2 from A and importing to B, what
would be the best behavior?

One way could be to make imported CVs “locked” somehow so they could
only be updated via import. If someone wanted to modify it for local
use, they could create a CCV using the locked CV.

B) What is the output format of hammer repository export --id . If I
am air gapped, how am I walking it across the network boundaries?

The export would be tgz. Users would need to scp it or have some other
means to obtain it locally before walking it over. We could add “export
and then download it locally” to hammer as a feature but it’s currently
not on the doc.

C) With your step entitled (# replace URL with on-disk location for
place to sync from on destination katello), I assume this will work for
Red Hat Conent only? How do you envision a model where a disconnected
satelite gets Red Hat and some custom content from a different
satellite, and some is mirrored from other internal repos?

I would like for the repo URL swap to go away ASAP :slight_smile: I believe it
should be constrained to just the products being exported though, so it
should not affect other repos.

Having said that though, we may want to investigate uploading content
into the destination instead of syncing from filesystem. This would let
the repo description (sync url and such) in Katello and Pulp on the
destination be 1:1 with the source. IMO, sync instead of upload would be
a Phase 3 optimization for online cases.

D) Why the limitation on rpm only content?

Only because rpm + errata seems to be the most requested, and I would
rather get things nailed down with this before moving on to other types.

Some content types (docker, possibly ostree but I don’t remember for
sure) have additional pieces of metadata that are used during sync but
not published. We would need to ensure that for any content type, a
publish/sync or export/import is lossless for data Katello and Pulp care
about.


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reading this, I ask if content is not a special case? I see from the
design you are trying to define a standard format (CSV) for content and
database data. Perhaps we embrace the fact that this is different and
just tackle the content export?

–bk

··· On 10/30/2015 10:55 AM, Chris Duryee wrote: > > > On 10/30/2015 08:34 AM, Bryan Kearney wrote: >> On 10/29/2015 03:35 PM, Chris Duryee wrote: >>> Katello devs, >>> >>> A design document for inter-server sync is available at >>> http://projects.theforeman.org/projects/katello/wiki/InterServerSync. >>> >>> The short version is that we would like to implement a way to sync data >>> between katello instances that will solve the same problem that's solved >>> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use >>> this new mechanism to replace functionality previously provided by >>> katello-disconnected, so there is no need for a separate script. >>> >>> If you have any questions or would like additional detail, feel free to >>> email or ask via IRC. Thanks! >>> >> >> A couple of things here. >> >> * I think the long term goal should be to export an entire org, the >> hostgroups, content views, templates, scap policies, etc. This design >> reads like content only. I think this is fine for now, but I wonder if >> you should hold off on content views until a broader data transport >> design is made. > > Agreed, being able to export 100% of the contents of an org is the long > term goal. > > Just to give some more background on the data format, for offline use > (aka all use until phase 3) the data format will be CSV for data without > a binary component (basically using katello-cli-csv for Ph 1, and then > building off of that in future iterations), and exported tgz for repos > and such. This all gets wrapped up into a bigger tgz, possibly with an > additional bit of metadata. The reason for CSV instead of JSON is that > it's easier for someone to hand-inspect to see what's being loaded. > > Some users may want to export in ISO format, but I think manual use of > mkisofs or genisoimage after the export is created should be ok. > > The format for online xfer (phase 3) is not decided on yet. It may be a > dynflow task that does a bit of discovery via API calls or local calls > and then making a number of API calls to the other Katello after > creating a set of tasks for what needs to happen. I have some learning > to do in this area so the doc is a bit vague :) > > For Phase 3, I would like to ensure that APIs for manipulating data all > work for doing "inverse" operations (i.e., for a given call, the output > of a GET works as input for the POST or PUT on a different server). IMO, > having to massage the data after its pulled out adds brittleness. I > don't know how feasible this goal is but I would like to try. There's no > need for hand-inspection of data here, so we can just use JSON without > conversion to CSV. > > I'll update the doc with this info today.

>
>
> >> Katello devs,
> >>
> >> A design document for inter-server sync is available at
> >> InterServerSync - Katello - Foreman.
> >>
> >> The short version is that we would like to implement a way to sync data
> >> between katello instances that will solve the same problem that's solved
> >> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use
> >> this new mechanism to replace functionality previously provided by
> >> katello-disconnected, so there is no need for a separate script.
> >>
> >> If you have any questions or would like additional detail, feel free to
> >> email or ask via IRC. Thanks!
> >>
> >
> > A couple of things here.
> >
> > * I think the long term goal should be to export an entire org, the
> > hostgroups, content views, templates, scap policies, etc. This design
> > reads like content only. I think this is fine for now, but I wonder if
> > you should hold off on content views until a broader data transport
> > design is made.
>
> Agreed, being able to export 100% of the contents of an org is the long
> term goal.
>
> Just to give some more background on the data format, for offline use
> (aka all use until phase 3) the data format will be CSV for data without
> a binary component (basically using katello-cli-csv for Ph 1, and then
> building off of that in future iterations), and exported tgz for repos
> and such. This all gets wrapped up into a bigger tgz, possibly with an
> additional bit of metadata. The reason for CSV instead of JSON is that
> it's easier for someone to hand-inspect to see what's being loaded.
>

Are you claiming its easier to hand inspect because you can upload it into
tools like spreadsheets? Or do you mean to actually look at the raw CSV
file and understand it? If the latter, I'd argue that the is not true and
that yaml is much cleaner to read.

Eric

··· On Fri, Oct 30, 2015 at 10:55 AM, Chris Duryee wrote: > On 10/30/2015 08:34 AM, Bryan Kearney wrote: > > On 10/29/2015 03:35 PM, Chris Duryee wrote:

Some users may want to export in ISO format, but I think manual use of
mkisofs or genisoimage after the export is created should be ok.

The format for online xfer (phase 3) is not decided on yet. It may be a
dynflow task that does a bit of discovery via API calls or local calls
and then making a number of API calls to the other Katello after
creating a set of tasks for what needs to happen. I have some learning
to do in this area so the doc is a bit vague :slight_smile:

For Phase 3, I would like to ensure that APIs for manipulating data all
work for doing “inverse” operations (i.e., for a given call, the output
of a GET works as input for the POST or PUT on a different server). IMO,
having to massage the data after its pulled out adds brittleness. I
don’t know how feasible this goal is but I would like to try. There’s no
need for hand-inspection of data here, so we can just use JSON without
conversion to CSV.

I’ll update the doc with this info today.

  • It also seems like the notion of live connect (think manifest
    chaining) is not in the cards. Is that correct?

The plan we have currently is that inter-server-sync does not deal with
manifests; the user would need to import them as they normally would.

Looking at The phase 1 Hammer Design

A) The pattern seems to be to pull out repos and products via the CLI.
HOw much harder would it be to specify a root ENV and Content View to
scope the calls? I ask because that would make it much more powerful in
phase 1.

The main snag we hit with export/import of content views outside of the
default view in library is the following:

  • user exports CV1 at version 1 from Katello A
  • user imports CV1 at version 1 to Katello B
  • user updates CV1 (now at version 2) on Katello A
  • user updates CV1 (also now at version 2) on Katello B

At this point, Katello A and B have CV1 version 2, but they are
different. When exporting CV1 version 2 from A and importing to B, what
would be the best behavior?

One way could be to make imported CVs “locked” somehow so they could
only be updated via import. If someone wanted to modify it for local
use, they could create a CCV using the locked CV.

B) What is the output format of hammer repository export --id . If I
am air gapped, how am I walking it across the network boundaries?

The export would be tgz. Users would need to scp it or have some other
means to obtain it locally before walking it over. We could add “export
and then download it locally” to hammer as a feature but it’s currently
not on the doc.

C) With your step entitled (# replace URL with on-disk location for
place to sync from on destination katello), I assume this will work for
Red Hat Conent only? How do you envision a model where a disconnected
satelite gets Red Hat and some custom content from a different
satellite, and some is mirrored from other internal repos?

I would like for the repo URL swap to go away ASAP :slight_smile: I believe it
should be constrained to just the products being exported though, so it
should not affect other repos.

Having said that though, we may want to investigate uploading content
into the destination instead of syncing from filesystem. This would let
the repo description (sync url and such) in Katello and Pulp on the
destination be 1:1 with the source. IMO, sync instead of upload would be
a Phase 3 optimization for online cases.

D) Why the limitation on rpm only content?

Only because rpm + errata seems to be the most requested, and I would
rather get things nailed down with this before moving on to other types.

Some content types (docker, possibly ostree but I don’t remember for
sure) have additional pieces of metadata that are used during sync but
not published. We would need to ensure that for any content type, a
publish/sync or export/import is lossless for data Katello and Pulp care
about.


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Eric D. Helms
Red Hat Engineering
Ph.D. Student - North Carolina State University

Deep dives generally get stuck into the code, or similar levels of
detail for other things. So it's probably best to wait until there is
some level of agreement on the direction of the work, and maybe even
a bit of psudeo-code here and there to demonstrate concepts.

That said, I'm happy to assist in setting up and promoting a deep dive
whenever you guys are ready to have one :slight_smile:

··· On 29 October 2015 at 20:03, Chris Duryee wrote: > > On 10/29/2015 03:40 PM, Eric D Helms wrote: >> Before I comment, are you planning a deep dive on the design? I will save >> questions for that if you are, otherwise I'll reply here after reading >> through the design. > > Let's stick with email for the time being, but if there is interest in a > deep dive I'm happy to schedule one.


Greg
IRC: gwmngilfen

It is the former, not the latter :slight_smile: The main use case is if a human
needs to inspect the export data before it is allowed to be brought to
the internal network.

··· On 10/30/2015 12:37 PM, Eric D Helms wrote: > On Fri, Oct 30, 2015 at 10:55 AM, Chris Duryee wrote: >> >> Just to give some more background on the data format, for offline use >> (aka all use until phase 3) the data format will be CSV for data without >> a binary component (basically using katello-cli-csv for Ph 1, and then >> building off of that in future iterations), and exported tgz for repos >> and such. This all gets wrapped up into a bigger tgz, possibly with an >> additional bit of metadata. The reason for CSV instead of JSON is that >> it's easier for someone to hand-inspect to see what's being loaded. >> > > Are you claiming its easier to hand inspect because you can upload it into > tools like spreadsheets? Or do you mean to actually look at the raw CSV > file and understand it? If the latter, I'd argue that the is not true and > that yaml is much cleaner to read. >

I expect that there will be several avenues to exporting and importing content.

export

% hammer repository export --product-id 456 --id 12 --dir /var/www/html/pub/rhel-6

import

% hammer repository synchronize --cdn-url http://master-server/pub/rhel-6/rhel/server/6/6Server/x86_64/os --product "Red Hat Enterprise Linux Server" --id 12
% hammer product synchronize --cdn-url http://master-server/pub/rhel-6/rhel/server/6/6Server/x86_64/os --name "Red Hat Enterprise Linux Server"

Or with hammer-cli-csv files it may look like this:

export

% hammer csv export --products products.csv --organization ACME
% cat products.csv [1]
Name,Label,Organization,Repository,Repository Type,Repository Url
Red Hat Enterprise Linux Server,Red_Hat_Enterprise_Linux_Server,Mega Corporation,Red Hat Enterprise Linux 6 Server x86_64 6Server,Red Hat Yum,http://master-server/pub/rhel-6/rhel/server/6/6Server/x86_64/os

import

% hammer csv import --products products.csv --sync

…to enable Red Hat repos from subscription, which creates the Red Hat product, and syncs the content

The CSV format is simply referencing the output location, which the server admin is going to need to define since we are not streaming iso/tgz to where hammer was run. I view the CSV as simply the convenience format that is readily consumable (and alterable). For example, we've heard that sometimes export organizations may be different than on the import server, etc.

That make sense?

[1] https://github.com/Katello/hammer-cli-csv/blob/master/test/data/products.csv#L15

··· ----- Original Message ----- > > > On 10/30/2015 10:55 AM, Chris Duryee wrote: > > > > > > On 10/30/2015 08:34 AM, Bryan Kearney wrote: > >> On 10/29/2015 03:35 PM, Chris Duryee wrote: > >>> Katello devs, > >>> > >>> A design document for inter-server sync is available at > >>> http://projects.theforeman.org/projects/katello/wiki/InterServerSync. > >>> > >>> The short version is that we would like to implement a way to sync data > >>> between katello instances that will solve the same problem that's solved > >>> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use > >>> this new mechanism to replace functionality previously provided by > >>> katello-disconnected, so there is no need for a separate script. > >>> > >>> If you have any questions or would like additional detail, feel free to > >>> email or ask via IRC. Thanks! > >>> > >> > >> A couple of things here. > >> > >> * I think the long term goal should be to export an entire org, the > >> hostgroups, content views, templates, scap policies, etc. This design > >> reads like content only. I think this is fine for now, but I wonder if > >> you should hold off on content views until a broader data transport > >> design is made. > > > > Agreed, being able to export 100% of the contents of an org is the long > > term goal. > > > > Just to give some more background on the data format, for offline use > > (aka all use until phase 3) the data format will be CSV for data without > > a binary component (basically using katello-cli-csv for Ph 1, and then > > building off of that in future iterations), and exported tgz for repos > > and such. This all gets wrapped up into a bigger tgz, possibly with an > > additional bit of metadata. The reason for CSV instead of JSON is that > > it's easier for someone to hand-inspect to see what's being loaded. > > > > Some users may want to export in ISO format, but I think manual use of > > mkisofs or genisoimage after the export is created should be ok. > > > > The format for online xfer (phase 3) is not decided on yet. It may be a > > dynflow task that does a bit of discovery via API calls or local calls > > and then making a number of API calls to the other Katello after > > creating a set of tasks for what needs to happen. I have some learning > > to do in this area so the doc is a bit vague :) > > > > For Phase 3, I would like to ensure that APIs for manipulating data all > > work for doing "inverse" operations (i.e., for a given call, the output > > of a GET works as input for the POST or PUT on a different server). IMO, > > having to massage the data after its pulled out adds brittleness. I > > don't know how feasible this goal is but I would like to try. There's no > > need for hand-inspection of data here, so we can just use JSON without > > conversion to CSV. > > > > I'll update the doc with this info today. > > > Reading this, I ask if content is not a special case? I see from the > design you are trying to define a standard format (CSV) for content and > database data. Perhaps we embrace the fact that this is different and > just tackle the content export? > > --bk >

Chris, could you clarify/confirm: this is only to replicate content
and content-like data; To generalize, the flow of replicated data is
always going to be uni-directional (master to “replicas") and we are
never going to have more than master.

-d

··· On Fri, Oct 30, 2015 at 3:03 PM, Tom McKay wrote: > > > ----- Original Message ----- >> >> >> On 10/30/2015 08:34 AM, Bryan Kearney wrote: >> > On 10/29/2015 03:35 PM, Chris Duryee wrote: >> >> Katello devs, >> >> >> >> A design document for inter-server sync is available at >> >> http://projects.theforeman.org/projects/katello/wiki/InterServerSync. >> >> >> >> The short version is that we would like to implement a way to sync data >> >> between katello instances that will solve the same problem that's solved >> >> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use >> >> this new mechanism to replace functionality previously provided by >> >> katello-disconnected, so there is no need for a separate script. >> >> >> >> If you have any questions or would like additional detail, feel free to >> >> email or ask via IRC. Thanks! >> >> >> > >> > A couple of things here. >> > >> > * I think the long term goal should be to export an entire org, the >> > hostgroups, content views, templates, scap policies, etc. This design >> > reads like content only. I think this is fine for now, but I wonder if >> > you should hold off on content views until a broader data transport >> > design is made. >> >> Agreed, being able to export 100% of the contents of an org is the long >> term goal. >> >> Just to give some more background on the data format, for offline use >> (aka all use until phase 3) the data format will be CSV for data without >> a binary component (basically using katello-cli-csv for Ph 1, and then >> building off of that in future iterations), and exported tgz for repos >> and such. This all gets wrapped up into a bigger tgz, possibly with an >> additional bit of metadata. The reason for CSV instead of JSON is that >> it's easier for someone to hand-inspect to see what's being loaded. >> >> Some users may want to export in ISO format, but I think manual use of >> mkisofs or genisoimage after the export is created should be ok. >> >> The format for online xfer (phase 3) is not decided on yet. It may be a >> dynflow task that does a bit of discovery via API calls or local calls >> and then making a number of API calls to the other Katello after >> creating a set of tasks for what needs to happen. I have some learning >> to do in this area so the doc is a bit vague :) >> >> For Phase 3, I would like to ensure that APIs for manipulating data all >> work for doing "inverse" operations (i.e., for a given call, the output >> of a GET works as input for the POST or PUT on a different server). IMO, >> having to massage the data after its pulled out adds brittleness. I >> don't know how feasible this goal is but I would like to try. There's no >> need for hand-inspection of data here, so we can just use JSON without >> conversion to CSV. > > Yeah, live connection is an entirely different design discussion outside the initial scope. > >> >> I'll update the doc with this info today. >> >> >> > >> > * It also seems like the notion of live connect (think manifest >> > chaining) is not in the cards. Is that correct? >> >> The plan we have currently is that inter-server-sync does not deal with >> manifests; the user would need to import them as they normally would. >> >> > >> > Looking at The phase 1 Hammer Design >> > >> > A) The pattern seems to be to pull out repos and products via the CLI. >> > HOw much harder would it be to specify a root ENV and Content View to >> > scope the calls? I ask because that would make it much more powerful in >> > phase 1. >> >> The main snag we hit with export/import of content views outside of the >> default view in library is the following: >> >> * user exports CV1 at version 1 from Katello A >> * user imports CV1 at version 1 to Katello B >> * user updates CV1 (now at version 2) on Katello A >> * user updates CV1 (also now at version 2) on Katello B >> >> At this point, Katello A and B have CV1 version 2, but they are >> different. When exporting CV1 version 2 from A and importing to B, what >> would be the best behavior? >> >> One way could be to make imported CVs "locked" somehow so they could >> only be updated via import. If someone wanted to modify it for local >> use, they could create a CCV using the locked CV. >> >> > >> > B) What is the output format of hammer repository export --id . If I >> > am air gapped, how am I walking it across the network boundaries? >> >> The export would be tgz. Users would need to scp it or have some other >> means to obtain it locally before walking it over. We could add "export >> and then download it locally" to hammer as a feature but it's currently >> not on the doc. >> >> > >> > C) With your step entitled (# replace URL with on-disk location for >> > place to sync from on destination katello), I assume this will work for >> > Red Hat Conent only? How do you envision a model where a disconnected >> > satelite gets Red Hat and some custom content from a different >> > satellite, and some is mirrored from other internal repos? >> >> I would like for the repo URL swap to go away ASAP :) I believe it >> should be constrained to just the products being exported though, so it >> should not affect other repos. >> >> Having said that though, we may want to investigate uploading content >> into the destination instead of syncing from filesystem. This would let >> the repo description (sync url and such) in Katello and Pulp on the >> destination be 1:1 with the source. IMO, sync instead of upload would be >> a Phase 3 optimization for online cases. >> >> > >> > D) Why the limitation on rpm only content? >> >> Only because rpm + errata seems to be the most requested, and I would >> rather get things nailed down with this before moving on to other types. >> >> Some content types (docker, possibly ostree but I don't remember for >> sure) have additional pieces of metadata that are used during sync but >> not published. We would need to ensure that for any content type, a >> publish/sync or export/import is lossless for data Katello and Pulp care >> about. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "foreman-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-dev+unsubscribe@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups "foreman-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.

> Chris, could you clarify/confirm: this is only to replicate content
> and content-like data; To generalize, the flow of replicated data is
> always going to be uni-directional (master to “replicas") and we are
> never going to have more than master.
>
more than one master.
-d

··· On Fri, Oct 30, 2015 at 3:11 PM, Dmitri Dolguikh wrote:

-d

On Fri, Oct 30, 2015 at 3:03 PM, Tom McKay thomasmckay@redhat.com wrote:

----- Original Message -----

On 10/30/2015 08:34 AM, Bryan Kearney wrote:

On 10/29/2015 03:35 PM, Chris Duryee wrote:

Katello devs,

A design document for inter-server sync is available at
InterServerSync - Katello - Foreman.

The short version is that we would like to implement a way to sync data
between katello instances that will solve the same problem that’s solved
by Spacewalk’s Inter-Spacewalk Sync. Additionally, we would like to use
this new mechanism to replace functionality previously provided by
katello-disconnected, so there is no need for a separate script.

If you have any questions or would like additional detail, feel free to
email or ask via IRC. Thanks!

A couple of things here.

  • I think the long term goal should be to export an entire org, the
    hostgroups, content views, templates, scap policies, etc. This design
    reads like content only. I think this is fine for now, but I wonder if
    you should hold off on content views until a broader data transport
    design is made.

Agreed, being able to export 100% of the contents of an org is the long
term goal.

Just to give some more background on the data format, for offline use
(aka all use until phase 3) the data format will be CSV for data without
a binary component (basically using katello-cli-csv for Ph 1, and then
building off of that in future iterations), and exported tgz for repos
and such. This all gets wrapped up into a bigger tgz, possibly with an
additional bit of metadata. The reason for CSV instead of JSON is that
it’s easier for someone to hand-inspect to see what’s being loaded.

Some users may want to export in ISO format, but I think manual use of
mkisofs or genisoimage after the export is created should be ok.

The format for online xfer (phase 3) is not decided on yet. It may be a
dynflow task that does a bit of discovery via API calls or local calls
and then making a number of API calls to the other Katello after
creating a set of tasks for what needs to happen. I have some learning
to do in this area so the doc is a bit vague :slight_smile:

For Phase 3, I would like to ensure that APIs for manipulating data all
work for doing “inverse” operations (i.e., for a given call, the output
of a GET works as input for the POST or PUT on a different server). IMO,
having to massage the data after its pulled out adds brittleness. I
don’t know how feasible this goal is but I would like to try. There’s no
need for hand-inspection of data here, so we can just use JSON without
conversion to CSV.

Yeah, live connection is an entirely different design discussion outside the initial scope.

I’ll update the doc with this info today.

  • It also seems like the notion of live connect (think manifest
    chaining) is not in the cards. Is that correct?

The plan we have currently is that inter-server-sync does not deal with
manifests; the user would need to import them as they normally would.

Looking at The phase 1 Hammer Design

A) The pattern seems to be to pull out repos and products via the CLI.
HOw much harder would it be to specify a root ENV and Content View to
scope the calls? I ask because that would make it much more powerful in
phase 1.

The main snag we hit with export/import of content views outside of the
default view in library is the following:

  • user exports CV1 at version 1 from Katello A
  • user imports CV1 at version 1 to Katello B
  • user updates CV1 (now at version 2) on Katello A
  • user updates CV1 (also now at version 2) on Katello B

At this point, Katello A and B have CV1 version 2, but they are
different. When exporting CV1 version 2 from A and importing to B, what
would be the best behavior?

One way could be to make imported CVs “locked” somehow so they could
only be updated via import. If someone wanted to modify it for local
use, they could create a CCV using the locked CV.

B) What is the output format of hammer repository export --id . If I
am air gapped, how am I walking it across the network boundaries?

The export would be tgz. Users would need to scp it or have some other
means to obtain it locally before walking it over. We could add “export
and then download it locally” to hammer as a feature but it’s currently
not on the doc.

C) With your step entitled (# replace URL with on-disk location for
place to sync from on destination katello), I assume this will work for
Red Hat Conent only? How do you envision a model where a disconnected
satelite gets Red Hat and some custom content from a different
satellite, and some is mirrored from other internal repos?

I would like for the repo URL swap to go away ASAP :slight_smile: I believe it
should be constrained to just the products being exported though, so it
should not affect other repos.

Having said that though, we may want to investigate uploading content
into the destination instead of syncing from filesystem. This would let
the repo description (sync url and such) in Katello and Pulp on the
destination be 1:1 with the source. IMO, sync instead of upload would be
a Phase 3 optimization for online cases.

D) Why the limitation on rpm only content?

Only because rpm + errata seems to be the most requested, and I would
rather get things nailed down with this before moving on to other types.

Some content types (docker, possibly ostree but I don’t remember for
sure) have additional pieces of metadata that are used during sync but
not published. We would need to ensure that for any content type, a
publish/sync or export/import is lossless for data Katello and Pulp care
about.


You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups “foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The initial scope is for content and associated data but I could see
other stuff being added in later like org information or environments.
Actual host info would never be copied over since the host can only be
registered to one place at a time.

I could see a case where you'd have a single Katello server with two
orgs, and each org wanted to import from different export files. Ideally
this would be doable without having to manipulate the export data.

It's technically possible someone would want to export some data from
Katello A to B and other data from B to A, but I think the burden would
be on them to make sure they are not breaking anything. I don't think
this is a use case we'd need to support.

··· On 10/30/2015 11:36 AM, Dmitri Dolguikh wrote: > On Fri, Oct 30, 2015 at 3:11 PM, Dmitri Dolguikh wrote: >> Chris, could you clarify/confirm: this is only to replicate content >> and content-like data; To generalize, the flow of replicated data is >> always going to be uni-directional (master to “replicas") and we are >> never going to have more than master. >> > more than *one* master.