Katello 3.4 sync time

lzap · May 31, 2017, 1:23pm

Hello,

with the new release, I performed some tests and it looks like RHEL
7.2 RPM repository (about 11.5k packages) sync with ondemand policy
takes around 40 minutes (download only took few seconds, the rest was
mostly celery and then about 15% was katello reindexing). This is Xeon
v3 bare metal with 10 GB RAM. I haven't compared with 3.3 but this is
sub-par performance. Is this something we will be tackling in the next
release?

Additional syncs is only about minute, so it's really the initial sync
which takes so much time. It is possible that there was no change in
the metadata on the CDN therefore everything was simply skipped. I
will try tomorrow.

···

-- Later, Lukas @lzap Zapletal

Bryan_Kearney · May 31, 2017, 1:28pm

How many GB is that repo, and what would the similar times be with curl
and/or mrepo?

– bk

···

On 05/31/2017 09:23 AM, Lukas Zapletal wrote: > Hello, > > with the new release, I performed some tests and it looks like RHEL > 7.2 RPM repository (about 11.5k packages) sync with ondemand policy > takes around 40 minutes (download only took few seconds, the rest was > mostly celery and then about 15% was katello reindexing). This is Xeon > v3 bare metal with 10 GB RAM. I haven't compared with 3.3 but this is > sub-par performance. Is this something we will be tackling in the next > release > > Additional syncs is only about minute, so it's really the initial sync > which takes so much time. It is possible that there was no change in > the metadata on the CDN therefore everything was simply skipped. I > will try tomorrow. >

ehelms · May 31, 2017, 2:25pm

> Hello,
>
> with the new release, I performed some tests and it looks like RHEL
> 7.2 RPM repository (about 11.5k packages) sync with ondemand policy
> takes around 40 minutes (download only took few seconds, the rest was
> mostly celery and then about 15% was katello reindexing). This is Xeon
> v3 bare metal with 10 GB RAM. I haven't compared with 3.3 but this is
> sub-par performance. Is this something we will be tackling in the next
> release?
>

Why is this sub-par performance? How long do you think it should take? Is
this an increase over previous? Initial processing time, whether on-demand
or not, is typically the longest sync you'll encounter for a repository
given all of the data that has to be downloaded and processed.

Eric

···

On Wed, May 31, 2017 at 9:23 AM, Lukas Zapletal wrote:

Additional syncs is only about minute, so it’s really the initial sync
which takes so much time. It is possible that there was no change in
the metadata on the CDN therefore everything was simply skipped. I
will try tomorrow.

–
Later,
Lukas @lzap Zapletal

–
You received this message because you are subscribed to the Google Groups
“foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

–
Eric D. Helms
Red Hat Engineering

lzap · June 1, 2017, 2:01pm

> Why is this sub-par performance? How long do you think it should take? Is
> this an increase over previous? Initial processing time, whether on-demand
> or not, is typically the longest sync you'll encounter for a repository
> given all of the data that has to be downloaded and processed.

We are taking metadata from CDN/upstream repo and simply "copying"
them into Pulp, there is no filtering applied or anything. It looks
like Pulp processes this "from scratch" while it could simply "trust"
the upstream metadata and just copy it? Then I'd expect minutes, but
not forty.

I understand there is some design in Pulp and some processing is going
on there for the initial sync, but I expected on-demand policy to be
even faster. I wonder if there are any plans improving initial sync in
Pulp3 maybe.

BK: I haven't compared with mrepo/createrepo or createrepo_c.

···

-- Later, Lukas @lzap Zapletal

ehelms · June 1, 2017, 4:08pm

> > Why is this sub-par performance? How long do you think it should take? Is
> > this an increase over previous? Initial processing time, whether
> on-demand
> > or not, is typically the longest sync you'll encounter for a repository
> > given all of the data that has to be downloaded and processed.
>
> We are taking metadata from CDN/upstream repo and simply "copying"
> them into Pulp, there is no filtering applied or anything. It looks
> like Pulp processes this "from scratch" while it could simply "trust"
> the upstream metadata and just copy it? Then I'd expect minutes, but
> not forty.
>

This is getting into more a discussion that should be had with the Pulp
project. But the from scratch is also to pull information into their
database to make it available for querying, viewing whats in a repository
and being able to process it for the streamer service so it appears like a
normal repository but without the bits being on disk until requested.

Eric

···

On Thu, Jun 1, 2017 at 10:01 AM, Lukas Zapletal wrote:

I understand there is some design in Pulp and some processing is going
on there for the initial sync, but I expected on-demand policy to be
even faster. I wonder if there are any plans improving initial sync in
Pulp3 maybe.

BK: I haven’t compared with mrepo/createrepo or createrepo_c.

–
Later,
Lukas @lzap Zapletal

–
You received this message because you are subscribed to the Google Groups
“foreman-dev” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

–
Eric D. Helms
Red Hat Engineering