Syncing of large repos failing on initial build

Problem:
I’m having problems performing the initial syncing of large repos. This is a new build (not an upgrade). I am trying to sync centos 6, 7 and 8 repos. The smaller centos 6/7 repos seem to sync without issues. The larger ones fail repeatedly, regardless of time of day. Interestingly, I had only had to repeat the syncing procedure once with a large centos 8 repos. These errors occur regardless of the ipaddress mirror.centos.org points to and regardless of time-of-day.

There is no resource contention that I can find. During the failed syncs, I can see network traffic and the filesystem growing. When the syncing fails, it appears that whatever has been downloaded is deleted.

I am able to connect to mirror.centos.org:80 during syncing via telnet without issues.

Expected outcome:

Repository syncing to complete successfully.

Foreman and Proxy versions:

FOREMAN: 2.1.0
RUBY: ruby 2.0.0p648 (2015-12-16) [x86_64-linux]
PUPPET: 6.17.0
pulp-katello-1.0.3-1.el7.noarch
katello-3.16.0-0.4.rc4.1.el7.noarch

Foreman and Proxy plugin versions:

Distribution and version:

platform: VM
16 gigs ram
centos 7.8
kernel: 3.10.0-1127.13.1.el7.x86_64

Other relevant data:

I see these errors in /var/log/messages:

Jul 16 13:43:05 foreman-01 pulpcore-worker-1: pulp: rq.worker:ERROR: aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host mirror.centos.org:80 ssl:default

Jul 16 13:43:06 foreman-01 pulpcore-worker-1: aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host mirror.centos.org:80 ssl:default [Connect call failed (‘204.15.127.248’, 80)]

Jul 16 13:43:06 foreman-01 pulpcore-worker-1: TimeoutError: [Errno 110] Connect call failed (‘204.15.127.248’, 80)

Thanks for the help.

@montys Sorry for the delay, thanks for the detailed report!

Which repos in particular are you syncing? I just tried http://mirror.centos.org/centos-8/8/extras/x86_64/os/ as a quick test without issue.

@iballou @Justin_Sherrill any known issues here in 3.16?

John,

I’ve been trying to sync these (I’ve formatted them to not be links):

http-mirror-centos-org/centos/6/updates/x86_64/
http-mirror-centos-org/centos/7/os/x86_64/
http-mirror-centos-org/centos/7/opstools/x86_64/
http-mirror-centos-org/centos/7/updates/x86_64/

As a means of troubleshooting, I downloaded centos/6/updates/x86_64/ to an internal webserver then pointed foreman to it. The sync failed with this setup as well. The sync process ran for ~12 minutes before it died. As before, I observed network traffic and storage increase.

I also noticed upwards of 18 connections from the foreman server to the webserver.


Jul 17 13:41:08 foreman-01 dynflow-sidekiq@orchestrator: 2020-07-17T18:41:08.313Z 1030 TID-2ya2lq Dynflow::Executors::Sidekiq::OrchestratorJobs::WorkerDone JID-ca8bac8dd19753502afffa35 INFO: start
Jul 17 13:41:08 foreman-01 dynflow-sidekiq@orchestrator: 2020-07-17T18:41:08.315Z 1030 TID-2ya2lq Dynflow::Executors::Sidekiq::OrchestratorJobs::WorkerDone JID-ca8bac8dd19753502afffa35 INFO: done: 0.001 sec
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: pulp: rq.worker:ERROR: aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: Traceback (most recent call last):
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/rq/worker.py”, line 883, in perform_job
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: rv = job.perform()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/rq/job.py”, line 645, in perform
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: self._result = self._execute()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/rq/job.py”, line 651, in _execute
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/synchronizing.py”, line 167, in synchronize
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: dv.create()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/declarative_version.py”, line 148, in create
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: loop.run_until_complete(pipeline)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/asyncio/base_events.py”, line 484, in run_until_complete
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: return future.result()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py”, line 225, in create_pipeline
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await asyncio.gather(*futures)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py”, line 43, in call
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await self.run()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/artifact_stages.py”, line 133, in run
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: pb.done += task.result() # download_count
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/artifact_stages.py”, line 159, in _handle_content_unit
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await asyncio.gather(*downloaders_for_content)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/models.py”, line 88, in download
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: download_result = await downloader.run(extra_data=self.extra_data)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/download/base.py”, line 227, in run
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: return await self._run(extra_data=extra_data)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/backoff/_async.py”, line 133, in retry
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: ret = await target(*args, **kwargs)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/download/http.py”, line 197, in _run
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: async with self.session.get(self.url, proxy=self.proxy, auth=self.auth) as response:
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/client.py”, line 1012, in aenter
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: self._resp = await self._coro
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/client.py”, line 504, in _request
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await resp.start(conn)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/client_reqrep.py”, line 847, in start
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: message, payload = await self._protocol.read() # type: ignore # noqa
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/streams.py”, line 591, in read
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await self._waiter
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: Traceback (most recent call last):
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/rq/worker.py”, line 883, in perform_job
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: rv = job.perform()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/rq/job.py”, line 645, in perform
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: self._result = self._execute()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/rq/job.py”, line 651, in _execute
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/synchronizing.py”, line 167, in synchronize
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: dv.create()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/declarative_version.py”, line 148, in create
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: loop.run_until_complete(pipeline)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/asyncio/base_events.py”, line 484, in run_until_complete
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: return future.result()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py”, line 225, in create_pipeline
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await asyncio.gather(*futures)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py”, line 43, in call
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await self.run()
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/artifact_stages.py”, line 133, in run
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: pb.done += task.result() # download_count
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/artifact_stages.py”, line 159, in _handle_content_unit
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await asyncio.gather(*downloaders_for_content)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/models.py”, line 88, in download
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: download_result = await downloader.run(extra_data=self.extra_data)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/download/base.py”, line 227, in run
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: return await self._run(extra_data=extra_data)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/backoff/_async.py”, line 133, in retry
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: ret = await target(*args, **kwargs)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib/python3.6/site-packages/pulpcore/download/http.py”, line 197, in _run
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: async with self.session.get(self.url, proxy=self.proxy, auth=self.auth) as response:
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/client.py”, line 1012, in aenter
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: self._resp = await self._coro
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/client.py”, line 504, in _request
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await resp.start(conn)
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/client_reqrep.py”, line 847, in start
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: message, payload = await self._protocol.read() # type: ignore # noqa
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: File “/usr/lib64/python3.6/site-packages/aiohttp/streams.py”, line 591, in read
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: await self._waiter
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: pulp: rq.worker:INFO: Cleaning registries for queue: 1004@foreman-01.nedisys.local
Jul 17 13:41:20 foreman-01 pulpcore-worker-2: pulp: rq.worker:INFO: 1004@foreman-01.nedisys.local: a0b027f7-38ed-462a-b82c-41d64610619a
Jul 17 13:41:21 foreman-01 pulpcore-worker-2: pulp: rq.worker:INFO: 1004@foreman-01.nedisys.local: Job OK (a0b027f7-38ed-462a-b82c-41d64610619a)
Jul 17 13:41:24 foreman-01 dynflow-sidekiq@worker: 2020-07-17T18:41:24.304Z 1010 TID-2y16sa Dynflow::Executors::Sidekiq::WorkerJobs::PerformWork JID-be22f61c263897cb8e1fb7e6 INFO: start
Jul 17 13:41:24 foreman-01 pulpcore-api: 127.0.0.1 - - [17/Jul/2020:18:41:24 +0000] “GET /pulp/api/v3/tasks/4fc9f8a3-6043-4545-90df-15e2e9aa82de/ HTTP/1.1” 200 3747 “-” “OpenAPI-Generator/3.4.1/ruby”

Thanks for your help.

I tried this on the CentOS 6 updates repo and saw a different error: Katello::Errors::Pulp3Error: Response payload is not completed

@montys what download policy are you using? Immediate?

Yes, download policy is set to immediate.

Regards,

Monty

I created a Pulp 3 bug so that this can be investigated: https://pulp.plan.io/issues/7186

I appreciate it.

Regards,

Monty

Hi, guys. Has this issue been resolved? I am experiencing this issue as well using ‘immediate’. However, if i choose ‘on-demand’, sync works fine.

Hello Team,

I am facing the same issue, is there any workaround available to get this done.

Just fresh installed Katello 3.18 on my VM. Synching CentOS7, EPEL and Foreman Client repos throws this error on 2 syncs so far:

Product Sync Error
CentOS 7 Base Incomplete Response payload is not completed
CentOS 7 Updates Incomplete Cannot connect to host www.mirrorservice.org:443 ssl:default [Network is unreachable]
CentOS 7 Extras Complete
CentOS 7 SCLo sclo
CentOS 7 SCLo rh Incomplete Response payload is not completed
EPEL 7
Foreman Client Complete

I looked at the Pulp ticket and it looks like they consider the duplicate issue closed. My VM isn’t huge but has plenty disk, but should this be happening on a fresh Katelli 3.18 install?

That looks a lot like the ‘Too many open files …’ thing I’m getting for the large repos.

Try this:
Go to products, click the CentOS 7 product and click the base repo, From the ‘Select Acton’ dropdown, select ‘Verify Content Checksum’ - that initiates a repo repair task. When the task fails and goes into the paused state, click task details and resume the task. If the next error you get in the tasks window is something like ‘Too may open files …’, you became a new member of the club …

Don’t bother about increasing the open files limits - it won’t help

1 Like

Same here!

Just synced OL7 repo and got:

[E|bac|] Response payload is not completed (Katello::Errors::Pulp3Error)

Ran verify checksum and got:

[Errno 24] Too many open files: ‘/var/lib/pulp/media/artifact/61/4e19996e65223e05f144ae6716db2e2d04e22cd5539a17a3b70b7872793436’

Any idea if this is fixed in Katello 4.0?

Hi,

I had the same issue on a clean install of Foreman 2.4 / Katello 4.0 on a brand new system running CentOS 8.
When syncing all repos for CentOS 7 or 8 at the same time, some of them resulted in “Warning”.
But when syncing the failed repos again one by one, they all went well.

However, I just tried to upload the CentOS 7 ISO (repo-type “file”), and it also failed with “Too many open files”

[root@foreman_server ISO]# hammer repository upload-content --path CentOS-7-x86_64-DVD-2009.iso --id 40 --organization “My_Organization”

Could not upload the content:
Task 78af15cd-c176-4da4-8edf-e33d4d0c7360: Katello::Errors::Pulp3Error: [Errno 24] Too many open files: ‘/var/lib/pulp/media/upload/1ac84aff-e0fd-45ae-88e5-76e87c6f93cc’

Foreman/katello version:
foreman-2.4.0-1.el8.noarch
katello-4.0.0-1.el8.noarch

OS:
CentOS Linux release 8.3.2011

Hi,

After applying following prlimit changes I succeeded in uploading the CentOS 7 ISO:

for PID in ps -u pulp | grep -v PID | awk '{print $1}'; do prlimit -n8192 -p $PID; done

Then started the ISO upload and followed up the # open files for all the processes for the user pulp:

for PID in ps -u pulp | grep -v PID | awk '{print $1}'; do echo “PID # ${PID} open files count : $(sudo ls -l /proc/${PID}/fd | wc -l)”; done

Once the iso was completely uploaded a new process was started which had 1900 open files:

PID # 254390 open files count : 1900

Hope this this might solve also the problem for syncing large repos as well the devs to apply fixes in the product to solve this issue.

Sadly this didn’t help when syncing the kickstart repos :frowning:

In my case I had to repeat syncing several times. The repo was using immediate download policy and it was rhel 7 kickstart. Good thing is, it uses everything that was synced before so retrying ~5 times worked. It was also suggested to increase the Sync Connection Timeout to a bigger number (300 -> 600) and trying to enable concurent download, e.g. by hammer repository update --id=26 --download-concurrency=5

I only bumped the timeout but that didn’t help much. I had to retry several times.

Getting this too, with a fresh Katello 4.0.1 installation. :frowning:

It seems Pulp3 does not retry on failure and therefore Katello task fails. Perhaps Katello should add some retry mechanism unless it can be added to Pulp3 directly.

setting a download concurrency of 1 seemed to solve the issue… if only 1 sync was running at a time. This is not reasonable and i’ve filed Issue #8867: downloads from cdn.redhat fail with 'Cannot connect to host cdn.redhat.com:443 ssl:default [[SSL: SSLV3_ALERT_UNEXPECTED_MESSAGE] sslv3 alert unexpected message (_ssl.c:877)]' - Pulp

I know they are looking into on the redhat cdn side, but ive seen so many different types of download issues that potentially could be resolved by having pulp do some sort of retrying that I’m planning on raising this as a bigger issue.

Same issue for me, with foreman 2.4 / katello 4.0.1 fresh install.

Download concurrency of 1 also helped for me as well. Repo sync got halfway through Red Hat EL8 AppStream repo, and then just would not go any further, failing again and again with response payload error. Lowering download concurrency to 1 got it progressing again, although it did fail again later and needed to kick off sync one more time to get it to finish.

Definitely looks like pulp needs to lower concurrency, and could be more robust - it should make some attempts to retry when it hits an error.