>> Katello devs,
>>
>> A design document for inter-server sync is available at
>> InterServerSync - Katello - Foreman.
>>
>> The short version is that we would like to implement a way to sync data
>> between katello instances that will solve the same problem that's solved
>> by Spacewalk's Inter-Spacewalk Sync. Additionally, we would like to use
>> this new mechanism to replace functionality previously provided by
>> katello-disconnected, so there is no need for a separate script.
>>
>> If you have any questions or would like additional detail, feel free to
>> email or ask via IRC. Thanks!
>>
>
> A couple of things here.
>
> * I think the long term goal should be to export an entire org, the
> hostgroups, content views, templates, scap policies, etc. This design
> reads like content only. I think this is fine for now, but I wonder if
> you should hold off on content views until a broader data transport
> design is made.
Agreed, being able to export 100% of the contents of an org is the long
term goal.
Just to give some more background on the data format, for offline use
(aka all use until phase 3) the data format will be CSV for data without
a binary component (basically using katello-cli-csv for Ph 1, and then
building off of that in future iterations), and exported tgz for repos
and such. This all gets wrapped up into a bigger tgz, possibly with an
additional bit of metadata. The reason for CSV instead of JSON is that
it's easier for someone to hand-inspect to see what's being loaded.
Some users may want to export in ISO format, but I think manual use of
mkisofs or genisoimage after the export is created should be ok.
The format for online xfer (phase 3) is not decided on yet. It may be a
dynflow task that does a bit of discovery via API calls or local calls
and then making a number of API calls to the other Katello after
creating a set of tasks for what needs to happen. I have some learning
to do in this area so the doc is a bit vague
For Phase 3, I would like to ensure that APIs for manipulating data all
work for doing "inverse" operations (i.e., for a given call, the output
of a GET works as input for the POST or PUT on a different server). IMO,
having to massage the data after its pulled out adds brittleness. I
don't know how feasible this goal is but I would like to try. There's no
need for hand-inspection of data here, so we can just use JSON without
conversion to CSV.
I'll update the doc with this info today.
>
> * It also seems like the notion of live connect (think manifest
> chaining) is not in the cards. Is that correct?
The plan we have currently is that inter-server-sync does not deal with
manifests; the user would need to import them as they normally would.
>
> Looking at The phase 1 Hammer Design
>
> A) The pattern seems to be to pull out repos and products via the CLI.
> HOw much harder would it be to specify a root ENV and Content View to
> scope the calls? I ask because that would make it much more powerful in
> phase 1.
The main snag we hit with export/import of content views outside of the
default view in library is the following:
- user exports CV1 at version 1 from Katello A
- user imports CV1 at version 1 to Katello B
- user updates CV1 (now at version 2) on Katello A
- user updates CV1 (also now at version 2) on Katello B
At this point, Katello A and B have CV1 version 2, but they are
different. When exporting CV1 version 2 from A and importing to B, what
would be the best behavior?
One way could be to make imported CVs "locked" somehow so they could
only be updated via import. If someone wanted to modify it for local
use, they could create a CCV using the locked CV.
>
> B) What is the output format of hammer repository export --id <id>. If I
> am air gapped, how am I walking it across the network boundaries?
The export would be tgz. Users would need to scp it or have some other
means to obtain it locally before walking it over. We could add "export
and then download it locally" to hammer as a feature but it's currently
not on the doc.
>
> C) With your step entitled (# replace URL with on-disk location for
> place to sync from on destination katello), I assume this will work for
> Red Hat Conent only? How do you envision a model where a disconnected
> satelite gets Red Hat and some custom content from a different
> satellite, and some is mirrored from other internal repos?
I would like for the repo URL swap to go away ASAP I believe it
should be constrained to just the products being exported though, so it
should not affect other repos.
Having said that though, we may want to investigate uploading content
into the destination instead of syncing from filesystem. This would let
the repo description (sync url and such) in Katello and Pulp on the
destination be 1:1 with the source. IMO, sync instead of upload would be
a Phase 3 optimization for online cases.
>
> D) Why the limitation on rpm only content?
Only because rpm + errata seems to be the most requested, and I would
rather get things nailed down with this before moving on to other types.
Some content types (docker, possibly ostree but I don't remember for
sure) have additional pieces of metadata that are used during sync but
not published. We would need to ensure that for any content type, a
publish/sync or export/import is lossless for data Katello and Pulp care
about.
···
On 10/30/2015 08:34 AM, Bryan Kearney wrote:
> On 10/29/2015 03:35 PM, Chris Duryee wrote: