Katello Logfiles for synced packages

Knuppes · November 25, 2020, 9:02am

**Problem: Hi, I’m missing something like the output of spacewalk-repo-sync, which listed all the packages that it needed to download. In Spacewalk there was also a directory somewhere in /var/log where all channels sync were logged.

Is there any possibility to get this information out of Katello. I’ve been searching for this for hours…

**Expected outcome: Logfiles for package syncs

Foreman and Proxy versions:

Foreman and Proxy plugin versions:

Distribution and version:

Other relevant data:

sajha · November 25, 2020, 6:26pm

I am not entirely sure of the log entries here but the foreman task to sync has some details around total packages downloaded and you can look for packages under the repository in the UI. You might also have details logged in /var/log/messages but to get the exact packages, it’s easier to go through katello UI or console.

KolonelKernel · June 3, 2021, 4:15am

Sajha, i think you miss the point.

It is NOT easier to look through the UI or console.

Can i grep the UI to see if a particular package has been downloaded?
Can i grep a UI for success or error messages?

Please, do not insult the intelligence of someone who has taken the time to ask a good question, by effectively saying “your question is stupid, piss off”.

Knuppes’ question deserves an answer.

So, will someone please answer his question.

Where are the sync logs?
And if there are no sync logs, THERE SHOULD BE.

gvde · June 3, 2021, 4:39am

How do you “grep” without a console?

And I suppose anything you can see in the UI you can also access by API. It’s not uncommon that many information isn’t available in simple text files anymore…

Keep your cool. If you get aggressive you won’t get anywhere in a community forum except maybe starting a shitstorm. SCREAMING doesn’t help either. Noone said or meant anything about that question being stupid.

Beyond that, the question was answered. The information is in /var/log/messages. I have just checked. It says exactly which RPMs and other repository metadata has been downloaded…

If that’s not enough, again this is community, you are free to create a pull request in github to add what you want. Screaming and yelling at people most likely won’t make them do what you want.

sajha · June 3, 2021, 7:02pm

Hi Kolonel,

Sorry if my earlier reply seemed unhelpful and might be the case I don’t understand the spacewalk reference to sync logs in the post like more experienced members of the community would.

In Katello, there are currently no direct logs of syncs and packages downloaded during the sync. In the absence of these sync logs, there’s some information captured in the foreman task output of a repository sync task. Pulp also directly logs some more detailed information in /var/log/messages about the sync.

With Pulp3 in newer katello versions (3.18 and up), there’s also a more helpful way to see what changed between 2 syncs. With pulp3, a sync task creates a new repository version. The pulp API endpoint documented here for the rpm plugin : repositories_rpm_rpm_versions_read - REST API for Pulp 3 RPM Plugin has a content_summary node which contains a list of packages added/removed during a sync.

The endpoint should be accessible on the newer boxes with the certs installed:
curl --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key https://hostname /pulp/api/v3/repositories/rpm/rpm | python -m json.tool

There’s also a CLI for pulp3 being actively worked on here: GitHub - pulp/pulp-cli which could be used to get this information.

Katello however does not store this information in its database and there is no tooling inside katello to get this information currently. I have filed a request here to add this to the radar: Feature #32717: Log details of packages downloaded/removed during a sync - Katello - Foreman

KolonelKernel · June 4, 2021, 8:28am

Ok, well, I apologise, my post was a bit confrontational. But i am sick of these sorts of difficulties, so it is difficult to contain the frustration sometimes.

To address a few things you’ve claimed in your reply though:

you say “anything you can see in the UI, you can also access by API”.
First, I am trying to look in a logfile, because the info is not in the UI.
Second… come on. I’m trying to get foreman to work, because i need to accomplish something with it. If foreman can’t do the simplest thing like sync a redhat repository, where am I going to find the time to figure out how to use the API to debug a basic issue?
The information is not in my /var/log/messages. I’ve synced CentOS 7 repos successfully, and Red Hat 8 BaseOS repos successfully, and not a single downloaded rpm is listed in /var/log/messages.

Further, /var/log/messages is full of selinux spam, so foreman/katello devs need to address that also. I don’t have time to tweak 1000 selinux things to silence those warnings, and yet, the doco says foreman/katello needs selinux ON, so, i supposedly cannot just turn it off…

I am running foreman 2.4 / katello 4, on el8, because my old 2.3/3.16 installation failed miserably on the upgrade to katello 3.18. The pulp3 was completely broken, and I gave up on it.

If it works on EL7, but there are still a few issues with running foreman/katello on el8, I understand. But it largely seems to be working, so it must be very close to working reasonably well.

But it is still very frustrating that I can’t even find things like repository sync being logged properly, so that makes it difficult for me to even file a bug and provide useful info to developers.

Additionally, I’ve just discovered all the rpms now seem to be stored in nonsense-names “artifact” files, like:
/var/lib/pulp/media/artifact/ff/76d0dc9beb8d6ff7d91af99040f25397b1da13b83b553394757075f25662db

I mean, really? That seriously does not help. Many times i have wanted to find an rpm or see what versions of an rpm are available. When theyre stored as files, all i need to do is “locate blah | grep rpm” and i can see exactly what is there. Then i can scp the file directly to a client, in about 2 seconds. Now i can’t do that.

Further, the other day I synced centos 7 repos, and i had pulp using 36 GB. I ended up deleting those repos because i wanted to structure the product hierarchy differently, and after deleting the products in foreman, they still used 36 GB. I then tried a foreman-rake command to delete orphaned content, and it did nothing, still 36 GB. Now what do I do? Do i have to query the API to try to work out what foreman has lost track of? If the rpms were stored as rpms, i could just delete them, and tell foreman to clean itself up. Now what do I do, if the orphan cleanup is failing to free space?

Bah. I just don’t like decisions that make products harder to use and harder to debug, when things go wrong, because things ALWAYS go wrong. Or things seem to go wrong, and you need to be able to check and verify things, in order to work out that no, things have not gone wrong.

So, developers, please, for God’s sake, please, just bless your users with the gift of logfiles, and a generally transparently-operating product. Not an opaque product that is just a broken black box when it fails. Logfiles are good for everyone.

Thank you.

KolonelKernel · June 4, 2021, 9:18am

Thanks for the info Sajha.
Sorry if i was a bit rude there. I’m sure you were probably just busy, and not trying to be unhelpful.

Re the logs, i am not seeing much at all re packages being downloaded in /var/log/messages. it certainly is not listing the rpms downloaded. I can see a backtrace in there when it fails, but is is not very informative about what actuallky went wrong. And i’m not sure how to view the output of the foreman task to do repository sync, unless you mean what is displayed in the UI.

It’s just that basic things like the sync process including the download of each and every package should be logged somewhere easily accessible, as it is happening.

There are so many problems with syncing - when setting up a new server and configuring your repos, with proxies, with red hat certificates, with licensing, with network issues, with the repo metadata or rpm content sometimes. Every single step in this process needs to be logged and visible as it happens, so people can see what is going wrong, and fix it - until they can see things are working. It’s just so painful otherwise. And this is more important than ever, in an upstream product like foreman, where there can be bugs in foreman that could otherwise have users going round and round in circles for hours, or days, trying to figure out what they’ve done wrong - when they havent done anything wrong, it’s a foreman bug. If you think what i’m talking about would create too much logging, well, i would doubt that. Logs can be zipped, or deleted, or the verbosity can be turned down when things are running smoothly.

I’ll take a look at the api, but currently it is refusing to give me https://hostname/pulp/api/v3/repositories/rpm/rpm/, with http 403 “Authentication credentials were not provided”, even though I was logged into the UI in same tab before trying to load that URL. Further, the 403 response page contains other links, including a login link to https://hostname/auth/login/?next=/pulp/api/v3/repositories/rpm/rpm/
but that link is broken, it doesn’t bring up a login dialog, only a “The page you were looking for doesn’t exist.” page. Adjusting the URL manually, removign the extra slash after login, before ?, does not help either. Still not found. So i’ll have to play around and adapt some other rest api scripts i’ve done, to do the basic auth that way. It’s just painful, when all i want to do is view a concise sync log, and work out why i can sync EL8 BaseOS repo, but not EL8 AppStream.

Every time i sync EL8 Appstream, i get
“Response payload is not completed” and a backtrace, and little else. I’ve logged bug report:
https://projects.theforeman.org/issues/32701
So i’ll just have to wait until that is looked at.

Googling “Response payload is not completed” shows up some pulp bugs:
https://pulp.plan.io/issues/7186
and Issue #7212: Adjust download concurrency - Pulp
which don’t seem to have any conclusive or relevant answer, unless the problem is the download concurrency, as they suggest, but I don;t believe that is the issue here, because i can sync baseos reliably, but not appstream.

But still, without a simple and concise log of which packages are being downloaded, I cannot even see if my sync is making some progress each time. If i could see it was managing to download 1000 packages or so on each sync, before hitting the error, then I could just sync 10 or 12 more times, and i’d have the 18000 packages in repo, and from then on, i could expect syncs to work, until at least the next pointrelease.

It’s just painful, without proper logging.

sajha · June 4, 2021, 2:39pm

Hey Kolonel,

+1 to that.

Currently, if a sync succeeds, Katello imports the package information into its database. Upon a successful sync, we can see the content that was downloaded for the repo in several places in katello. UI/Hammer/API etc. With hammer, hammer package list (Hammer CLI Guide Red Hat Satellite 6.6 | Red Hat Customer Portal) should show the packages after a successful sync.

But if the sync is unsuccessful, katello doesn’t import any of the content. At that point, only pulp knows about that data.

The way katello integrated the fix for the bug you found with download concurrency is we have that field on the API but we don’t expose it on the UI. The default value for this is 10 which works in most cases. If you do want to tweak the value to something lower, it can be changed on the failing repository via hammer or console.

You could try the below in the foreman-rake console:

repo = Katello::Repository.find(id)
repo.root.update!(:download_concurrency=>5)
smart_proxy = SmartProxy.pulp_primary
repo.backend_service(smart_proxy).update_remote

You could give the sync another try after this. If it still fails, it would be helpful to add the backtrace from /var/log/messages on the issue you filed.

The Orphan cleanup task should delete unused content from the file system. Could you confirm the task succeeded? Katello triggers an asynchronous task on pulp to do this and after triggering it, exits with “Orphaned content deletion started in background.” message. If there’s a failure in the async task, it is captured in the foreman task you can see on the UI under tasks page.

I had a rough time, the last I tried getting to the pulp API through a browser because of routing conflicts among others. I find it easier to curl with the certs installed in the katello boxes to authenticate.

curl --cert /etc/pki/katello/certs/pulp-client.crt --key /etc/pki/katello/private/pulp-client.key https://hostname /pulp/api/v3/repositories/rpm/rpm | python -m json.tool

You shouldn’t have to dig into the pulp data though outside of general curiosity. I’m hoping that reducing the download concurrency will help with the sync issue.

sajha · June 4, 2021, 3:05pm

Via hammer:

hammer repository update --id=repo_id --download-concurrency=5

KolonelKernel · June 7, 2021, 8:42am

TY Sajha.

I have managed to get past the “Response payload is not completed”… and sync Red Hat EL8 Appstream.

Unfortunately, i started by setting the download concurrency to 5, and then 1… on the wrong repo ID.

So i then went and found the pulpcore 3.11 repo updated my python3-aiohttp-3.7.2-1.el8.x86_64 from pulpcore 3.9, to: python3-aiohttp-3.7.4-1.el8.x86_64 from pulpcore 3.11

And that didn’t seem to help, so then i went to the cntlm proxy that my foreman needs to go through (as the proxy libs within foreman still, inexplably, are too stupid to handle proxies requiring simple ntlm basic authentication). And i changed the cntlm proxy so it was running serialized. And this did not seem to help.

So then i went and doublechecked things and noticed i’d reduced the concurrency in foreman/katello to wrong repo ID.

When i reduced the concurrency on the correct repo id, then the repo sync started to work better.

Even with this configuration, the sync make some progress, and still failed with “Server disconnected”. Then tyhe next sync managed to finish downloading all the remaining packages, but somehow failed at the very end with “Response payload” error again.

And then one more final sync, was fully successful at last.

This would not have been successful, without reducing the concurrency level, because, before reducing that concurrency, I tried syncing, again, and again, stopping foreman, restarting, syncing again. I tried everything, but hit a wall at 15,000 packages downloaded. Repeated sync attempts would make no progress past this point, until I reduced concurrency to 1.

At this stage though, i can’t rule out that the python3-aiohttp rpm & cntlm serialization may have helped also - but, can confirm that reducing the concurrency absolutely was necessary.

Many TY Sajha for suggestion on how to quickly adjust that!

It does look like the current default concurrency setting is problematic and i think it should be exposed in the UI, & documented better. At this stage. I’m not sure if the concurrency is not playing well just with RedHat, or if it is not playing well with my cntlm proxy (or a combination). Either way, this is absolutely a scenario that is going to wreak havoc with corporate users, as many corporate users will be having foreman/katello use a cntlm proxy, due to inability of katello to get through authenticating corporate proxies.

KolonelKernel · June 7, 2021, 9:03am

Hmmmm re the orphan cleanup. i checked the task list and yeah there was a failed task there for orphan cleanup:

Failed with error:
RuntimeError: There was an issue with the backend service pulp: 404 Not Found

groan. i’m pretty sure my pulp3 is basically working, as i’ve been able to sync repos, eventually.
I wonder is this command:
foreman-rake katello:delete_orphaned_content RAILS_ENV=production
no longer correct for pulp3?

gvde · June 7, 2021, 1:26pm

I have noticed the same. See Katello 4.0 katello:delete_orphaned_content - #4 by gvde

There are still some pulp2 api calls in the code which break if pulp2 has been removed…

sajha · June 7, 2021, 2:17pm

We have seen similar issues with syncing large repos from more users. Thank you for your patience on this one. Did you have any problems earlier with the cntlm proxy? It would be helpful to file an issue here: https://pulp.plan.io/projects/pulp_rpm/issues/new with details of what you are seeing.

I am able to reproduce this as well on pulp3 only boxes. We need to drop some pulp2 calls from katello in the task on new boxes. I filed this: Bug #32740: Katello:delete_orphaned_content still checking for pulp2 services and failing with 404s. - Katello - Foreman for the same and we should have a fix soon.

sajha · June 8, 2021, 2:20pm

Could you confirm which katello version you’re on?

KolonelKernel · June 9, 2021, 1:33am

Hi Sajha,
i’m on foreman-2.4.0-1.el8 and katello-4.0.1-1.el8
And it was a fresh install.

Also noticed someone else with same versions having response payload error here, and also fixed by changing concurrency to 1:

And similar issues here:

sajha · June 9, 2021, 1:43am

I saw some issues filed as well. Should have some updates tomorrow after katello/pulp triage.