Hello All,
As part of moving more Katello entities to scoped search I did some
performance comparisons Using a nightly install which is using scoped
search for errata on postgresql and katello 2.0 which uses elasticsearch.
Both systems were using 6 GB ram, with 2 cpus each on a vm running on
the same host. I loaded 200,000 fake errata into the database with a
bz, cve, and three packages each and created 5 repositories with 20,000
errata each. I then wrote a few different queries that users would
likely perform returning the total count and 20 items for each query and
executed them10 times averaging the results (time in seconds):
Scoped Search:
Errata: 0.0041088655
Errata count: 0.1569917291
Errata type filter: 0.004434209000000001
Errata type filter count: 0.0766710245
Errata type search: 0.0042759371
Errata type search count: 0.07862921749999999
Errata Package name: 0.39945394970000003
Errata Package name count: 0.38841846230000004
Elastic Search:
Errata: 0.07552989400000001
Errata count: 0.06409571280000001
Errata type filter: 0.0291230768
Errata type filter count: 0.06429808210000001
Errata type search: 0.0739272778
Errata type search count: 0.018988432399999998
Package name: 0.1048855049
Package name count: 0.0442273977
Note that initially the scoped search queries were much much slower and
required a good deal of optimization adding various indexes. Prior to
the optimization, most of the scope search queries were in the .5s to 1s
range. Indexes already existed but were not adequate to achieve these
final performance numbers. Due to manner that scoped search allows the
user to search with various columns, the number of indexes required
would also increase and it may be difficult or impossible to provide
this level of performance for all queries from the user. I don't think
this is too terrible as default queries used throughout the app should
be able to optimized and detailed user queries being a bit slower would
be acceptable. Also note that very little postgresql server optimization
was done other than bumping shared_buffers & effected_cache_size in
postgresql.conf to around 200MB. This could be increased further and
further optimizations could be performed.
No optimizations were done to elasticsearch.
Conclusion:
Scoped search is sufficient for our needs today for katello entities and
entities purely in backend systems (such as packages and errata).
Looking to the future if we aim to scale to a million or more hosts (for
example), we likely would want to consider more loosely integrating
elasticsearch in an optional manner for just entities that need it if
postgresql fails to perform well enough.
Let me know if you have any questions
-Justin
PS
The code for the queries themselves are:
def scope_tests
[
time("Errata"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).order('updated').limit(20).all},
time("Errata
count"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).count(:distinct
=> true)},
time("Errata type
filter"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).where(:errata_type
=> :security).order('updated').limit(20).all},
time("Errata type filter
count"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).where(:errata_type
=> :security).count(:distinct => true)},
time("Errata type
search"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).search_for("type
= security").order('updated').limit(20).all},
time("Errata type search
count"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).search_for("type
= security").count(:distinct => true)},
time("Package
name"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).search_for("package_name
~ a*").order('updated').limit(20).all},
time("Package name
count"){Katello::Erratum.in_repositories(Katello::Repository.all[0…3]).search_for("package_name
~ a*").count(:distinct => true)}
]
end
def es_tests
[
time("Errata"){ Katello::Errata.search{ size 20; query{all}; filter
:and, [{:terms => {:repoids =>
Katello::Repository.pluck(:pulp_id)[0…2]}}]} },
time("Errata count"){ Katello::Errata.search{ query{all}; filter
:and, [{:terms => {:repoids =>
Katello::Repository.pluck(:pulp_id)[0…2]}}]}.total },
time("Errata type filter"){ Katello::Errata.search{ size 20;
query{all}; filter :and, [{:terms => {:type => [:security]}}, {:terms =>
{:repoids => Katello::Repository.pluck(:pulp_id)[0…2]}}]} },
time("Errata type filter count"){ Katello::Errata.search{ query{all};
filter :and, [{:terms => {:type => [:security]}}, {:terms => {:repoids
=> Katello::Repository.pluck(:pulp_id)[0…2]}}]}.total },
time("Errata type search"){ Katello::Errata.search{ size 20;
query{string 'type:security'}; filter :terms, {:repoids =>
Katello::Repository.pluck(:pulp_id)[0…2]}} },
time("Errata type search count"){ Katello::Errata.search{
query{string 'type:security'}; filter :terms, {:repoids =>
Katello::Repository.pluck(:pulp_id)[0…2]}}.total },
time("Package name"){ Katello::Errata.search{ size 20; query{string
'pkglist.packages.name:a'}; filter :terms, {:repoids =>
Katello::Repository.pluck(:pulp_id)[0…2]}} },
time("Package name count"){ Katello::Errata.search{ query{string
'pkglist.packages.name:a'}; filter :terms, {:repoids =>
Katello::Repository.pluck(:pulp_id)[0…2]}}.total }
]
end