Proxy sync 10x slower with Foreman 3.8/Katello 4.10

Here is the Dynflow attached as a log.
5.Actions.Pulp3.Orchestration.Repository.RefreshRepos.log (84.7 KB)

Looking back in time to on a similar sync task in 3.71/4.9.2 there is only one “pulp_href” in the pulp task compared to 156 of them in 3.8/4.10.
Seems to match the number of repositories I have (156).

Ex.
5: Actions::Pulp3::Orchestration::Repository::RefreshRepos"

5: Actions::Pulp3::Orchestration::Repository::RefreshRepos (success) [ 13.40s / 13.40s ]
Queue: default

Started at: 2023-11-09 13:22:27 UTC

Ended at: 2023-11-09 13:22:40 UTC

Real time: 13.40s

Execution time (excluding suspended state): 13.40s

Input:

---
smart_proxy_id: 7
environment_id: 
content_view_id: 
repository_id: 33
remote_user: admin
remote_cp_user: admin
current_request_id: ba96c9e5-3a7f-4a6b-8fb8-fa883cf1448f
current_timezone: UTC
current_organization_id: 
current_location_id: 
current_user_id: 6
Output:

---
pulp_tasks:
- pulp_href: "/pulp/api/v3/tasks/fe333043-8ab1-4f80-b8bb-6879540b19a6/"
  pulp_created: '2023-11-09T13:22:39.062+00:00'
  state: completed
  name: pulpcore.app.tasks.base.general_update
  logging_cid: ba96c9e5-3a7f-4a6b-8fb8-fa883cf1448f
  started_at: '2023-11-09T13:22:39.110+00:00'
  finished_at: '2023-11-09T13:22:39.133+00:00'
  worker: "/pulp/api/v3/workers/e486bd25-9a8c-408b-b278-f1985a761625/"
  child_tasks: []
  progress_reports: []
  created_resources: []
  reserved_resources_record:
  - "/pulp/api/v3/remotes/rpm/rpm/fec7c69c-8bbd-4b52-9ab9-df39c16cbbfa/"
task_groups: []
Chunked output:

--- []

Also want to add that the setup for all proxies is to sync the “Lifecycle Environments” “Library” only.

Seeing this as well. The question is does the updated Pulp 3.39, that is in Foreman 3.9/Katello 4.11 fix this issue?

Unsure if this issue is really known at this point. Not really have the right skillset to debug this on my own. I can deliver any logs or perform tests. I do not have any “test” system either to validate if the new version would fix anything. Any release notes for Pulp 3.39 saying anything related?

If pulpcore 3.39 did fix this issue, then that would be a “happy accident”. AFAIK the Pulp team is not currently aware of any performance regressions in the versions that ship with Katello 4.10 (other than having been told about the existence of this thread).

The last big performance issue from the other thread should be fixed in the Pulp versions that ship with Katello 4.10. As we learned from that issue, performance issues are sometimes dependent on the postgresql version used, and the Pulp team generally tests with a newer postgresql version than the one shipped on El8 (and by extension what is used by Foreman/Katello). As a result it is entirely possible that you have hit a performance issue that the Pulp team is not aware of from their own testing.

If you did want to dig through the Pulp release notes, see: Changelog — Pulp Project 3.44.0.dev documentation
Note that “Resolved a sync-time performance regression. #4591” is the fix that is already included with Katello 4.10.

Syncing one repository really shouldn’t be causing all repositories on the system to be refreshed, so this might be something looked into in Katello rather than Pulp.

If you sync a repo, for example, Katello is suppose to only interact with that one repo rather than all of them.

I noticed this in your Dynflow output:

environment_id: 
content_view_id: 
repository_id: 

What sort of smart proxy sync was this? Was it a normal smart proxy sync, or was it triggered by a content view publish or a sync?

1 Like

Also, I’ve created a bug so we don’t lose this: Bug #36990: Investigation: RefreshRepos from a repository sync causes updates of all repositories on smart proxy - Katello - Foreman

Basically it is triggered by a repo sync. The way we have it setup is.

  1. Devs are pushing the rpm to a local repository.
  2. The repository see a new package arrive and using the Foreman API to trigger a repo sync of the local repo.
  3. Foreman/Katello start the repo sync and finds a the new rpm in our local repo and then triggers a “Sync Repository on Smart Proxy(ies)” task that sync the new meta data to all proxies since the Library is configured in the Lifecycle Environment configurations for all proxies.

The same flow is triggered also when we release new docker images in our local registry.

Also note that the API call to sync the local repo initially failed after upgrading to 3.8 and only after adding the resource “Smart proxy: manage_capsule_content” to the role it again started to work. Seen in this thread → No sync to Proxies after docker repo updated - #2 by lfu
That issue could be unrelated to this one though.

1 Like

In the Dynflow for the log that you sent up in (Proxy sync 10x slower with Foreman 3.8/Katello 4.10 - #12 by tedevil), you should see something like this:

Where the Input for Actions::Katello::CapsuleContent::Sync has repository_id: <some id>

For me, that repository ID is propagated to RefreshRepos.

Can you confirm that you see the repository_id in the input for Actions::Katello::CapsuleContent::Sync?


Also I’ll mention that I tested this so far on our freshest code and it’s not immediately reproducible. Taking a look at 4.10 now.

Edit: I’m not reproducing this issue on 4.10 – after uploading a file to my repository, my smart proxy (that is syncing Library) only refreshes the single repository that got the RPM upload.


We did have a bug fixed in 4.11 that related to fixing RefreshRepos for smart proxy syncs that have no repository ID, environment ID, or content view ID: Fixes #36926 - RefreshRepos called for relevant repos only by pmoravec · Pull Request #10803 · Katello/katello · GitHub

If the logs from above were for an “unscoped” smart proxy sync, this PR could help.

@nixfu , like what I asked above for @tedevil , can you use Dynflow to see what is taking up the extra time?

Also if anyone feels like trying out a patch: https://github.com/Katello/katello/pull/10803.patch

I’m going to triage this to Katello 4.10 so the next 4.10 release receives this fix at least.

See no repository id as input (or organization and location):
image

Looking back to sync tasks in 3.7/4.9 it looks however the same for the input for “Actions::Katello::CapsuleContent::Sync”.

However if I look on the “Actions::Pulp3::Orchestration::Repository::RefreshRepos” the repository_id: " is missing on 3.8/4.10.:
3.7/4.9:
image

3.8/4.10:
image

1 Like

Appreciate the info, there’s one more comparison I’d like to see:

In Dynflow there will also be a Actions::Pulp3::CapsuleContent::Sync. Mind showing if that receives a repository_id for each Katello version?

repository_id is set for both versions so no issue for “Actions::Pulp3::CapsuleContent::Sync”.

1 Like

I made an odd discovery with the issue we were having that was taking 5-7 hours to do smartproxy syncs. All our smartproxies are on-demand only.

We had previously had the “restrict composite content view promotion” set to True in our settings.

That means that our individual content views, AS WELL as our composite content views had to be promoted to the same level such as “prod”.

Our smartproxies are set to sync everything in a promotion level, such as “prod”.

I disabled that restriction setting, and I removed all individual content views from the promotion level, so they are all just at Library now, and only the composite content views we have are now promoted through the promotion process and sent out to the smartproxies.

Needless to say this has reduced the amount of repos synced to the smartproxies quite a bit.

We went from probably 50 listed on each smartproxy, down to the less than 10 top level repos on each smart proxy which make up our much small number of composite content views we use to organize different types of systems.

Now the smartproxy syncs are about 15 mins with a “complete sync”, when it was taking 5-7 hours.

So either it was something to do with a difference between syncing content views, vs syncing composite views, or just the fact we had so many views being synced out to the smartproxies.

So to summarize this so far, would you say this patch will solve this issue? How would one try and apply it? It is only on the Foreman server itself or also the proxies?

That patch should help, but we have still yet to figure out in the code why your “refresh repos” step is not getting a repository_id. It would be a workaround at best.

You’d need to cd into /usr/share/gems/gems/katello-4.10.0/ and then run patch -p1 < 10803.patch. You can skip any patches that fail to find files, they’re likely test-related.

Example:

[root@centos8-stream-katello-4-10 katello-4.10.0]# patch -p1 < 10803.patch 
patching file app/lib/actions/katello/capsule_content/refresh_repos.rb
patching file app/lib/actions/katello/capsule_content/sync.rb
Hunk #1 succeeded at 36 (offset 1 line).
patching file app/lib/actions/katello/capsule_content/sync_capsule.rb
Hunk #1 succeeded at 14 (offset -1 lines).
can't find file to patch at input line 79
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/test/actions/katello/capsule_content_test.rb b/test/actions/katello/capsule_content_test.rb
|index 59fb7690a45..7202cab2258 100644
|--- a/test/actions/katello/capsule_content_test.rb
|+++ b/test/actions/katello/capsule_content_test.rb
--------------------------
File to patch: 
Skip this patch? [y] y
Skipping patch.
7 out of 7 hunks ignored

After you patch it, run foreman-maintain service restart.

If you need to revert the patch, you can just reinstall the rubygem-katello RPM via dnf.

Alternatively, you can wait for Katello 4.10.1 which is slated to receive this fix.

That’s a good point you bring up, @viwon. I can imagine some folks do need to consume from both the “component” and composite content view versions, but if you don’t, it would certainly save time and space.

Did you notice this slowdown after a specific upgrade? There was a separate Pulp issue (mentioned above I believe) that did cause a real slowdown for the syncing portion of smart proxy syncs, so that could be part of it. But here it seems we’re also having an issue where more smart proxies are being updated than there should be.

Performed an OS update on my Foreman server (AlmaLinux 8.8 to 8.9) during the holidays and seen the issue is now solved. I can see some new Foreman related packages was upgraded/installed:

Old:
pulpcore-selinux-1.3.3-1.el8.x86_64
rubygem-foreman-tasks-8.2.0-1.fm3_8.el8.noarch
rubygem-foreman_remote_execution-11.1.0-1.fm3_8.el8.noarch
rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.fm3_6.el8.noarch
puppet7-release-7.0.0-14.el8.noarch

New:
pulpcore-selinux-2.0.0-1.el8.x86_64
rubygem-foreman-tasks-8.3.3-1.fm3_8.el8.noarch
rubygem-foreman_remote_execution-11.1.1-1.fm3_8.el8.noarch
rubygem-smart_proxy_remote_execution_ssh-0.10.3-1.fm3_8.el8.noarch
puppet7-release-7.0.0-15.el8.noarch

A complete global sync to all my proxies (one package added to a repo) is now down to ~55 seconds

1 Like

Appreciate the report back @tedevil , I can only wonder then if you were hitting an issue with foreman-tasks itself. Perhaps an issue with serialization of inputs? I can only guess. This will be good information for anyone who hits the issue in the future.