Problem:
Pulcore coredumps everytime when foreman-maintain service stop is executed and critical messages are sent to /var/log/messages.
This happens everyday when offline backup is started in our environment.
Jan 16 04:00:19 hostname systemd-coredump[297768]: Process 210742 (pulpcore-api) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297770]: Process 210746 (pulpcore-api) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297769]: Process 210722 (pulpcore-api) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297767]: Process 210770 (pulpcore-api) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297794]: Process 210720 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297795]: Process 210787 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297797]: Process 210762 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297815]: Process 210908 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297817]: Process 210896 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297819]: Process 210923 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297826]: Process 210731 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:19 hostname systemd-coredump[297833]: Process 210743 (pulpcore-conten) of user 991 dumped core. Jan 16 04:00:20 hostname systemd-coredump[297851]: Process 210696 (pulpcore-conten) of user 991 dumped core.
Expected outcome:
Foreman services will be gracefully stopped.
Foreman and Proxy versions:
foreman-3.9.1-1.el8.noarch
katello-4.11.0-1.el8.noarch
Before the core dump in your messages log, pulp is indicating that it’s unable to connect to the database. The offline backup you mentioned is likely the cause; it’s closing the db endpoints before pulp closes its connections. A workaround would be to spin down the foreman service then run the offline backup.
This is a small error on the foreman-maintain side. I’ll go ahead and submit a report for it on Monday. Thanks for letting us know.
“Wants” defines weak relation which affect only startup sequence and problem described here is happening during stopping of services. More reasonable would be Require:
Unfortunately also no success with “Requires”. Core dumping is still happening. Moreover core dump happens when I tried halt only service pulpcore-api and postgresql remained running. Exact command:
But @hao-yu said in a private discussion it’s related to our psycopg package – and usually, when Hao says something, he’s right (spoiler: also in this case!)
According to him, the issue is fixed when using psycopg[binary] – but we can’t use that, we need to compile things from source.
Let’s dig deeper into how psycopg uses libpq (the PostgreSQL client library). Everybody who wants to read some Python: head over to psycopg/psycopg/psycopg/pq/__init__.py at master · psycopg/psycopg · GitHub , everybody else: trust me that it has 3 implementations - “C”, “binary” and “Python”. “C” and “binary” are actually the same (C/Cython) code, but built and distributed differently. They both contain a Python extension that is linked against libpq. “Python” is different – it’s also using libpq, but via ctypes (Python’s FFI interface) and not by linking to it!
Now, if we look at the stack trace again, we see it’s actually hitting _ctypes.cpython-311-x86_64-linux-gnu.so while crashing! That’s because we’re using the “Python” (ctypes) interface, while installing psycopg[binary] moves us over to the “Binary” interface.
As noted above, “C” and “Binary” are (should be?) technically identical – and there is psycopg-c · PyPI which we can build (it contains sources!) and install. I’ve build it locally and it fixes the segfaults, without pulling in the forbidden “binary” package.