Foreman 2.5.4 locks up if libvirt compute_resource no longer exists and you sync or edit compute_resource

cody_c · October 24, 2021, 2:17am

Problem

\We regularly call the sync VMs against all the computes in the compute list, as we move VMs around quite a bit, and in the previous version we ran (2.2) if a compute in the compute list happened to no longer exist (but it’s DNS record still existed for whatever reason) this would, at worst, throw some errors in the log and the the whole application would move on with it’s day. We would call the api with about 15 threads at a time and chug through the whole list of hundreds of computes in about 10 to 20 seconds regardless of how many dead records we may have.

Updating to 2.5.3 and later 2.5.4 when the sync vm is called on just one bad compute record, the entire application locks up, and no further threads can run. In Netstat I see a lot of CLOSE_WAIT and any additional requests (from the UI or API) just hang and eventually the requests timeout, but the app never really recovers. The same can happen if I try to edit the compute_resource as well.

Expected outcome:
Like 2.2, if just a single record happens to be bad, the entire application shouldn’t lock up and prevent the processing of any other requests.

Foreman and Proxy versions:
2.5.4,

Foreman and Proxy plugin versions:

Distribution and version:

Other relevant data:
The symptoms seem very similar to another thread I started, where foreman seems to do the same locked up behavior, but in that case it happens after some amount of puppet reports come in, in that the whole application locks up, is unusable and I have to restart all the proccesses to get it to recover. This is just a very easy way to reproduce it. I can reproduce by just clicking the sync VM button in the UI, I don’t even have to go through the API to get it to lock up.

tbrisker · October 25, 2021, 7:24am

This looks like it may be related to Bug #14854: Libvirt connection leaks - Foreman which was recently fixed. I’m not certain if the patch applies cleanly to 2.5.4 but you could try giving it a spin and see if this resolves the issue for you.

lzap · October 25, 2021, 12:46pm

That was my thought as well, give the patch a try and get back to us. It should apply either cleanly or with little effort.

cody_c · October 25, 2021, 6:30pm

I’m not terribly commited to 2.5. I tried 3.x and it didn’t go well. The settings couldn’t be updated or where lost, and trying to remove a host threw foreign key constraints, so we rolled that back as well. If 3.x has been fixed up in that regard, I’d be happy to retry it and see if it also fixes this libvirt error. Other than replacing the files directly from that patch, I’m not sure exactly how to apply it.