For sure seen the thread but not really sure it is the same issue and also figured it was patched in 3.28.19 that I am running or is there an unpatched issue still?
I can confirm the fix that is meant to reslove the issue from that thread is in pulpcore 3.28.19 which is what you have got, so that should not be it. (Unless the fix does not always work).
Before the fix was ready, there was an alternative workaround, that consistent in reverting a certain commit. (The fix in pulpcore 3.28.19 does not do this). It might be worth applying that workaround one more time just to be sure this is not still the same issue.
The other thing that might be worth testing is if a larger smart proxy sync is also much slower now.
A small repo going from 60 seconds to 8 Minutes could be a case of something like: “There is now an extra task that adds an extra fixed 7 Minutes to this action”. This would also not be great, but it would be different from: “All Syncs now take 10 times as long”. (As in a 1 hour sync now takes 10 hours).
Did some of suggested tests but did not have any affect.
I had backups of Foreman and 5 of my proxies with 3.71/4.9.2 so I rolled back the machines, synced all repos and then performed a simple test by just adding one package to one of our production repos and measure the speed the meta data is synced to my proxies all over the world.
se = sweden where the foreman server is located. latency ~3 ms
us = USA, latency ~111 ms
ca = Canada, latency ~177 ms
cl = Chile, latency ~276 ms
au = Australia, latency ~420 ms
Synchronize smart proxy 'seproxy' stopped success December 09, 2023 at 09:51:08 PM 5 seconds
Synchronize smart proxy 'usproxy' stopped success December 09, 2023 at 09:51:08 PM 17 seconds
Synchronize smart proxy 'caproxy' stopped success December 09, 2023 at 09:51:06 PM 25 seconds
Synchronize smart proxy 'clproxy' stopped success December 09, 2023 at 09:51:07 PM 36 seconds
Synchronize smart proxy 'auproxy' stopped success December 09, 2023 at 09:51:09 PM 50 seconds
As you can see the performance for 3.8/4.10 is horrific. Each time a package is released in our production repos, the sync is crazy slow and stacks up when several packages + syncs runs quickly after each other. Sadly I did not have full backup of all proxies so I guess to be able to go back to 3.71/4.9.2 on all proxies I need to redeploy 50% of my proxies unless there is a good way to downgrade a proxy…
Installed packages on the 3.71/4.9.2 Forman server.
@tedevil can you show us what actions in the Dynflow console for your smart proxy sync task are taking all of the time? I’m assuming it’s the actual Pulp 3 content sync but we need to verify that first.
It’s very interesting that RefreshRepos is taking all the time and it’s not even the sync. All it really does is create repositories and remote.
@tedevil are you able to share the raw output in Dynflow for Actions::Pulp3::Orchestration::Repository::RefreshRepos ?
That would show us how long the specific Pulp tasks are taking.
If the tasks themselves aren’t taking up the time, then perhaps the amount of API queries increased.
If it’s not a Pulp task taking up all the time, we might need to dig into /var/log/foreman/production.log and /var/log/messages on the Foreman and /var/log/messages on the Smart Proxy to see what specifically is taking up so much time.
Looking back in time to on a similar sync task in 3.71/4.9.2 there is only one “pulp_href” in the pulp task compared to 156 of them in 3.8/4.10.
Seems to match the number of repositories I have (156).
Unsure if this issue is really known at this point. Not really have the right skillset to debug this on my own. I can deliver any logs or perform tests. I do not have any “test” system either to validate if the new version would fix anything. Any release notes for Pulp 3.39 saying anything related?
If pulpcore 3.39 did fix this issue, then that would be a “happy accident”. AFAIK the Pulp team is not currently aware of any performance regressions in the versions that ship with Katello 4.10 (other than having been told about the existence of this thread).
The last big performance issue from the other thread should be fixed in the Pulp versions that ship with Katello 4.10. As we learned from that issue, performance issues are sometimes dependent on the postgresql version used, and the Pulp team generally tests with a newer postgresql version than the one shipped on El8 (and by extension what is used by Foreman/Katello). As a result it is entirely possible that you have hit a performance issue that the Pulp team is not aware of from their own testing.
If you did want to dig through the Pulp release notes, see: Changelog — Pulp Project 3.44.0.dev documentation
Note that “Resolved a sync-time performance regression. #4591” is the fix that is already included with Katello 4.10.
Syncing one repository really shouldn’t be causing all repositories on the system to be refreshed, so this might be something looked into in Katello rather than Pulp.
If you sync a repo, for example, Katello is suppose to only interact with that one repo rather than all of them.
I noticed this in your Dynflow output:
environment_id:
content_view_id:
repository_id:
What sort of smart proxy sync was this? Was it a normal smart proxy sync, or was it triggered by a content view publish or a sync?
Basically it is triggered by a repo sync. The way we have it setup is.
Devs are pushing the rpm to a local repository.
The repository see a new package arrive and using the Foreman API to trigger a repo sync of the local repo.
Foreman/Katello start the repo sync and finds a the new rpm in our local repo and then triggers a “Sync Repository on Smart Proxy(ies)” task that sync the new meta data to all proxies since the Library is configured in the Lifecycle Environment configurations for all proxies.
The same flow is triggered also when we release new docker images in our local registry.
Also note that the API call to sync the local repo initially failed after upgrading to 3.8 and only after adding the resource “Smart proxy: manage_capsule_content” to the role it again started to work. Seen in this thread → No sync to Proxies after docker repo updated - #2 by lfu
That issue could be unrelated to this one though.
Where the Input for Actions::Katello::CapsuleContent::Sync has repository_id: <some id>
For me, that repository ID is propagated to RefreshRepos.
Can you confirm that you see the repository_id in the input for Actions::Katello::CapsuleContent::Sync?
Also I’ll mention that I tested this so far on our freshest code and it’s not immediately reproducible. Taking a look at 4.10 now.
Edit: I’m not reproducing this issue on 4.10 – after uploading a file to my repository, my smart proxy (that is syncing Library) only refreshes the single repository that got the RPM upload.