Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV/SIGABRT in PyEval_RestoreThread #127893

Open
absurdfarce opened this issue Dec 12, 2024 · 4 comments
Open

SIGSEGV/SIGABRT in PyEval_RestoreThread #127893

absurdfarce opened this issue Dec 12, 2024 · 4 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@absurdfarce
Copy link

absurdfarce commented Dec 12, 2024

Bug report

Bug description:

Reported by a customer; we haven't reproduced this yet internally but we're trying to do so.

Customer has an app which frequently invokes Apache Cassandra's cqlsh to update data. This update generally succeeds but on rare (and apparently random) occasions the operation fails with a SIGSEGV or SIGABRT in Python. To begin with a few critical stats:

  • Python version is 3.11.2
  • Python driver version (for Cassandra) is 3.25.0
    • We've seen very similar crashes when using either libev or asyncore

The SIGSEGV case usually looks something like the following:

[New LWP 112742]
[New LWP 112723]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 /usr/share/dse/resources/cassandra/bin/cqlsh.py --ssl --cqlshrc /etc/ca'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  take_gil (tstate=tstate@entry=0x16f80b0) at ../Python/ceval_gil.h:231
231	../Python/ceval_gil.h: No such file or directory.
[Current thread is 1 (Thread 0x7f9a497d96c0 (LWP 112742))]
#0  take_gil (tstate=tstate@entry=0x16f80b0) at ../Python/ceval_gil.h:231
#1  0x0000000000577a68 in PyEval_RestoreThread (tstate=tstate@entry=0x16f80b0) at ../Python/ceval.c:535
#2  0x00000000006298ae in select_poll_poll_impl (self=self@entry=0x7f9a4b293e70, timeout_obj=<optimized out>) at ../Modules/selectmodule.c:655
#3  0x0000000000629a9b in select_poll_poll (self=0x7f9a4b293e70, args=0x7f9a48fd53b0, nargs=1) at ../Modules/clinic/selectmodule.c.h:223
#4  0x000000000058859b in _PyEval_EvalFrameDefault (tstate=0x16f80b0, frame=0x7f9a48fd5320, throwflag=<optimized out>) at ../Python/ceval.c:5328
#5  0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x16f80b0, frame=frame@entry=0x7f9a48fd5188, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#6  0x000000000058a17c in _PyEval_Vector (tstate=0x16f80b0, func=0x7f9a4aa785d0, locals=locals@entry=0x0, args=0x7f9a497d8a38, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#7  0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#8  0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x16f80b0, callable=callable@entry=<function at remote 0x7f9a4aa785d0>, args=args@entry=0x7f9a497d8a38, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#9  0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#10 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x16f80b0, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f9a4a0034d0>, tuple=tuple@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:245
#11 0x00000000004a9b2c in _PyObject_Call (tstate=0x16f80b0, callable=callable@entry=<method at remote 0x7f9a4a0034d0>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:328
#12 0x00000000004a9b6f in PyObject_Call (callable=callable@entry=<method at remote 0x7f9a4a0034d0>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:355
#13 0x000000000057835b in do_call_core (tstate=tstate@entry=0x16f80b0, func=func@entry=<method at remote 0x7f9a4a0034d0>, callargs=callargs@entry=(), kwdict=kwdict@entry={}, use_tracing=0) at ../Python/ceval.c:7353
#14 0x0000000000588945 in _PyEval_EvalFrameDefault (tstate=0x16f80b0, frame=0x7f9a48fd5110, throwflag=<optimized out>) at ../Python/ceval.c:5379
#15 0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x16f80b0, frame=frame@entry=0x7f9a48fd5020, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#16 0x000000000058a17c in _PyEval_Vector (tstate=0x16f80b0, func=0x7f9a4b9ed650, locals=locals@entry=0x0, args=0x7f9a497d8d88, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#17 0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#18 0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x16f80b0, callable=callable@entry=<function at remote 0x7f9a4b9ed650>, args=args@entry=0x7f9a497d8d88, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#19 0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#20 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x16f80b0, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f9a4a003470>, tuple=tuple@entry=(), kwargs=kwargs@entry=0x0) at ../Objects/call.c:245
#21 0x00000000004a9b2c in _PyObject_Call (tstate=0x16f80b0, callable=<method at remote 0x7f9a4a003470>, args=(), kwargs=0x0) at ../Objects/call.c:328
#22 0x00000000004a9b6f in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:355
#23 0x00000000006a117f in thread_run (boot_raw=boot_raw@entry=0x7f9a4aae7290) at ../Modules/_threadmodule.c:1092
#24 0x00000000005db0e4 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:246
#25 0x00007f9a4c2f11c4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#26 0x00007f9a4c37185c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x7f9a4c2672c0 (LWP 112723)):
#0  0x00000000005e9c52 in validate_list (head=head@entry=0xaa5228 <_PyRuntime+59656>, flags=flags@entry=collecting_clear_unreachable_clear) at ../Modules/gcmodule.c:397
#1  0x00000000005eb562 in gc_collect_main (tstate=tstate@entry=0xabf2d8 <_PyRuntime+166328>, generation=generation@entry=2, n_collected=n_collected@entry=0x7ffd41387410, n_uncollectable=n_uncollectable@entry=0x7ffd41387418, nofail=nofail@entry=0) at ../Modules/gcmodule.c:1224
#2  0x00000000005eba7e in gc_collect_with_callback (tstate=tstate@entry=0xabf2d8 <_PyRuntime+166328>, generation=generation@entry=2) at ../Modules/gcmodule.c:1400
#3  0x00000000005ebff9 in PyGC_Collect () at ../Modules/gcmodule.c:2086
#4  0x00000000005c6400 in Py_FinalizeEx () at ../Python/pylifecycle.c:1830
#5  0x00000000005e97ae in Py_RunMain () at ../Modules/main.c:682
#6  0x00000000005e97fe in pymain_main (args=args@entry=0x7ffd41387500) at ../Modules/main.c:710
#7  0x00000000005e9883 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Modules/main.c:734
#8  0x0000000000420fef in main (argc=<optimized out>, argv=<optimized out>) at ../Programs/python.c:15

Thread 1 (Thread 0x7f9a497d96c0 (LWP 112742)):
#0  take_gil (tstate=tstate@entry=0x16f80b0) at ../Python/ceval_gil.h:231
#1  0x0000000000577a68 in PyEval_RestoreThread (tstate=tstate@entry=0x16f80b0) at ../Python/ceval.c:535
#2  0x00000000006298ae in select_poll_poll_impl (self=self@entry=0x7f9a4b293e70, timeout_obj=<optimized out>) at ../Modules/selectmodule.c:655
#3  0x0000000000629a9b in select_poll_poll (self=0x7f9a4b293e70, args=0x7f9a48fd53b0, nargs=1) at ../Modules/clinic/selectmodule.c.h:223
#4  0x000000000058859b in _PyEval_EvalFrameDefault (tstate=0x16f80b0, frame=0x7f9a48fd5320, throwflag=<optimized out>) at ../Python/ceval.c:5328
#5  0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x16f80b0, frame=frame@entry=0x7f9a48fd5188, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#6  0x000000000058a17c in _PyEval_Vector (tstate=0x16f80b0, func=0x7f9a4aa785d0, locals=locals@entry=0x0, args=0x7f9a497d8a38, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#7  0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#8  0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x16f80b0, callable=callable@entry=<function at remote 0x7f9a4aa785d0>, args=args@entry=0x7f9a497d8a38, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#9  0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#10 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x16f80b0, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f9a4a0034d0>, tuple=tuple@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:245
#11 0x00000000004a9b2c in _PyObject_Call (tstate=0x16f80b0, callable=callable@entry=<method at remote 0x7f9a4a0034d0>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:328
#12 0x00000000004a9b6f in PyObject_Call (callable=callable@entry=<method at remote 0x7f9a4a0034d0>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:355
#13 0x000000000057835b in do_call_core (tstate=tstate@entry=0x16f80b0, func=func@entry=<method at remote 0x7f9a4a0034d0>, callargs=callargs@entry=(), kwdict=kwdict@entry={}, use_tracing=0) at ../Python/ceval.c:7353
#14 0x0000000000588945 in _PyEval_EvalFrameDefault (tstate=0x16f80b0, frame=0x7f9a48fd5110, throwflag=<optimized out>) at ../Python/ceval.c:5379
#15 0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x16f80b0, frame=frame@entry=0x7f9a48fd5020, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#16 0x000000000058a17c in _PyEval_Vector (tstate=0x16f80b0, func=0x7f9a4b9ed650, locals=locals@entry=0x0, args=0x7f9a497d8d88, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#17 0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#18 0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x16f80b0, callable=callable@entry=<function at remote 0x7f9a4b9ed650>, args=args@entry=0x7f9a497d8d88, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#19 0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#20 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x16f80b0, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f9a4a003470>, tuple=tuple@entry=(), kwargs=kwargs@entry=0x0) at ../Objects/call.c:245
#21 0x00000000004a9b2c in _PyObject_Call (tstate=0x16f80b0, callable=<method at remote 0x7f9a4a003470>, args=(), kwargs=0x0) at ../Objects/call.c:328
#22 0x00000000004a9b6f in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:355
#23 0x00000000006a117f in thread_run (boot_raw=boot_raw@entry=0x7f9a4aae7290) at ../Modules/_threadmodule.c:1092
#24 0x00000000005db0e4 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:246
#25 0x00007f9a4c2f11c4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#26 0x00007f9a4c37185c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The SIGABRT case looks something like this:

[New LWP 1810487]
[New LWP 1810400]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 /usr/share/dse/resources/cassandra/bin/cqlsh.py --ssl --cqlshrc /etc/ca'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f0a0b0326c0 (LWP 1810487))]
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f0a0db4bf1f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f0a0dafcfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f0a0dae7472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007f0a0dae7395 in __assert_fail_base (fmt=0x7f0a0dc5ba90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x741bb8 "!_PyMem_IsPtrFreed(tstate->interp)", file=file@entry=0x74045c "../Python/ceval.c", line=line@entry=228, function=function@entry=0x744730 <__PRETTY_FUNCTION__.81> "is_tstate_valid") at ./assert/assert.c:92
#5  0x00007f0a0daf5eb2 in __GI___assert_fail (assertion=assertion@entry=0x741bb8 "!_PyMem_IsPtrFreed(tstate->interp)", file=file@entry=0x74045c "../Python/ceval.c", line=line@entry=228, function=function@entry=0x744730 <__PRETTY_FUNCTION__.81> "is_tstate_valid") at ./assert/assert.c:101
#6  0x0000000000572f33 in is_tstate_valid (tstate=tstate@entry=0x2517b90) at ../Python/ceval.c:228
#7  0x0000000000577406 in take_gil (tstate=tstate@entry=0x2517b90) at ../Python/ceval_gil.h:229
#8  0x0000000000577a68 in PyEval_RestoreThread (tstate=tstate@entry=0x2517b90) at ../Python/ceval.c:535
#9  0x00000000006298ae in select_poll_poll_impl (self=self@entry=0x7f0a0cb0a840, timeout_obj=<optimized out>) at ../Modules/selectmodule.c:655
#10 0x0000000000629a9b in select_poll_poll (self=0x7f0a0cb0a840, args=0x7f0a0a82e3b0, nargs=1) at ../Modules/clinic/selectmodule.c.h:223
#11 0x000000000058859b in _PyEval_EvalFrameDefault (tstate=0x2517b90, frame=0x7f0a0a82e320, throwflag=<optimized out>) at ../Python/ceval.c:5328
#12 0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x2517b90, frame=frame@entry=0x7f0a0a82e188, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#13 0x000000000058a17c in _PyEval_Vector (tstate=0x2517b90, func=0x7f0a0c2e0730, locals=locals@entry=0x0, args=0x7f0a0b031a38, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#14 0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#15 0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x2517b90, callable=callable@entry=<function at remote 0x7f0a0c2e0730>, args=args@entry=0x7f0a0b031a38, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#16 0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#17 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x2517b90, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f0a0b86b230>, tuple=tuple@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:245
#18 0x00000000004a9b2c in _PyObject_Call (tstate=0x2517b90, callable=callable@entry=<method at remote 0x7f0a0b86b230>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:328
#19 0x00000000004a9b6f in PyObject_Call (callable=callable@entry=<method at remote 0x7f0a0b86b230>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:355
#20 0x000000000057835b in do_call_core (tstate=tstate@entry=0x2517b90, func=func@entry=<method at remote 0x7f0a0b86b230>, callargs=callargs@entry=(), kwdict=kwdict@entry={}, use_tracing=0) at ../Python/ceval.c:7353
#21 0x0000000000588945 in _PyEval_EvalFrameDefault (tstate=0x2517b90, frame=0x7f0a0a82e110, throwflag=<optimized out>) at ../Python/ceval.c:5379
#22 0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x2517b90, frame=frame@entry=0x7f0a0a82e020, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#23 0x000000000058a17c in _PyEval_Vector (tstate=0x2517b90, func=0x7f0a0d2557b0, locals=locals@entry=0x0, args=0x7f0a0b031d88, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#24 0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#25 0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x2517b90, callable=callable@entry=<function at remote 0x7f0a0d2557b0>, args=args@entry=0x7f0a0b031d88, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#26 0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#27 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x2517b90, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f0a0b86abd0>, tuple=tuple@entry=(), kwargs=kwargs@entry=0x0) at ../Objects/call.c:245
#28 0x00000000004a9b2c in _PyObject_Call (tstate=0x2517b90, callable=<method at remote 0x7f0a0b86abd0>, args=(), kwargs=0x0) at ../Objects/call.c:328
#29 0x00000000004a9b6f in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:355
#30 0x00000000006a117f in thread_run (boot_raw=boot_raw@entry=0x7f0a0b9a20c0) at ../Modules/_threadmodule.c:1092
#31 0x00000000005db0e4 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:246
#32 0x00007f0a0db4a1c4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#33 0x00007f0a0dbca85c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x7f0a0dac02c0 (LWP 1810400)):
#0  0x00000000005e9c52 in validate_list (head=head@entry=0xaa5228 <_PyRuntime+59656>, flags=flags@entry=collecting_clear_unreachable_clear) at ../Modules/gcmodule.c:397
#1  0x00000000005eb562 in gc_collect_main (tstate=tstate@entry=0xabf2d8 <_PyRuntime+166328>, generation=generation@entry=2, n_collected=n_collected@entry=0x7ffe4e522000, n_uncollectable=n_uncollectable@entry=0x7ffe4e522008, nofail=nofail@entry=0) at ../Modules/gcmodule.c:1224
#2  0x00000000005eba7e in gc_collect_with_callback (tstate=tstate@entry=0xabf2d8 <_PyRuntime+166328>, generation=generation@entry=2) at ../Modules/gcmodule.c:1400
#3  0x00000000005ebff9 in PyGC_Collect () at ../Modules/gcmodule.c:2086
#4  0x00000000005c6400 in Py_FinalizeEx () at ../Python/pylifecycle.c:1830
#5  0x00000000005e97ae in Py_RunMain () at ../Modules/main.c:682
#6  0x00000000005e97fe in pymain_main (args=args@entry=0x7ffe4e5220f0) at ../Modules/main.c:710
#7  0x00000000005e9883 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Modules/main.c:734
#8  0x0000000000420fef in main (argc=<optimized out>, argv=<optimized out>) at ../Programs/python.c:15

Thread 1 (Thread 0x7f0a0b0326c0 (LWP 1810487)):
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f0a0db4bf1f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f0a0dafcfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f0a0dae7472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007f0a0dae7395 in __assert_fail_base (fmt=0x7f0a0dc5ba90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x741bb8 "!_PyMem_IsPtrFreed(tstate->interp)", file=file@entry=0x74045c "../Python/ceval.c", line=line@entry=228, function=function@entry=0x744730 <__PRETTY_FUNCTION__.81> "is_tstate_valid") at ./assert/assert.c:92
#5  0x00007f0a0daf5eb2 in __GI___assert_fail (assertion=assertion@entry=0x741bb8 "!_PyMem_IsPtrFreed(tstate->interp)", file=file@entry=0x74045c "../Python/ceval.c", line=line@entry=228, function=function@entry=0x744730 <__PRETTY_FUNCTION__.81> "is_tstate_valid") at ./assert/assert.c:101
#6  0x0000000000572f33 in is_tstate_valid (tstate=tstate@entry=0x2517b90) at ../Python/ceval.c:228
#7  0x0000000000577406 in take_gil (tstate=tstate@entry=0x2517b90) at ../Python/ceval_gil.h:229
#8  0x0000000000577a68 in PyEval_RestoreThread (tstate=tstate@entry=0x2517b90) at ../Python/ceval.c:535
#9  0x00000000006298ae in select_poll_poll_impl (self=self@entry=0x7f0a0cb0a840, timeout_obj=<optimized out>) at ../Modules/selectmodule.c:655
#10 0x0000000000629a9b in select_poll_poll (self=0x7f0a0cb0a840, args=0x7f0a0a82e3b0, nargs=1) at ../Modules/clinic/selectmodule.c.h:223
#11 0x000000000058859b in _PyEval_EvalFrameDefault (tstate=0x2517b90, frame=0x7f0a0a82e320, throwflag=<optimized out>) at ../Python/ceval.c:5328
#12 0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x2517b90, frame=frame@entry=0x7f0a0a82e188, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#13 0x000000000058a17c in _PyEval_Vector (tstate=0x2517b90, func=0x7f0a0c2e0730, locals=locals@entry=0x0, args=0x7f0a0b031a38, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#14 0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#15 0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x2517b90, callable=callable@entry=<function at remote 0x7f0a0c2e0730>, args=args@entry=0x7f0a0b031a38, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#16 0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#17 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x2517b90, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f0a0b86b230>, tuple=tuple@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:245
#18 0x00000000004a9b2c in _PyObject_Call (tstate=0x2517b90, callable=callable@entry=<method at remote 0x7f0a0b86b230>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:328
#19 0x00000000004a9b6f in PyObject_Call (callable=callable@entry=<method at remote 0x7f0a0b86b230>, args=args@entry=(), kwargs=kwargs@entry={}) at ../Objects/call.c:355
#20 0x000000000057835b in do_call_core (tstate=tstate@entry=0x2517b90, func=func@entry=<method at remote 0x7f0a0b86b230>, callargs=callargs@entry=(), kwdict=kwdict@entry={}, use_tracing=0) at ../Python/ceval.c:7353
#21 0x0000000000588945 in _PyEval_EvalFrameDefault (tstate=0x2517b90, frame=0x7f0a0a82e110, throwflag=<optimized out>) at ../Python/ceval.c:5379
#22 0x000000000058a07b in _PyEval_EvalFrame (tstate=tstate@entry=0x2517b90, frame=frame@entry=0x7f0a0a82e020, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:73
#23 0x000000000058a17c in _PyEval_Vector (tstate=0x2517b90, func=0x7f0a0d2557b0, locals=locals@entry=0x0, args=0x7f0a0b031d88, argcount=<optimized out>, kwnames=0x0) at ../Python/ceval.c:6435
#24 0x00000000004a9be2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:393
#25 0x00000000004abfe0 in _PyObject_VectorcallTstate (tstate=tstate@entry=0x2517b90, callable=callable@entry=<function at remote 0x7f0a0d2557b0>, args=args@entry=0x7f0a0b031d88, nargsf=nargsf@entry=1, kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:92
#26 0x00000000004ac197 in method_vectorcall (method=<optimized out>, args=0xaa4f50 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#27 0x00000000004a97e2 in _PyVectorcall_Call (tstate=tstate@entry=0x2517b90, func=0x4ac056 <method_vectorcall>, callable=callable@entry=<method at remote 0x7f0a0b86abd0>, tuple=tuple@entry=(), kwargs=kwargs@entry=0x0) at ../Objects/call.c:245
#28 0x00000000004a9b2c in _PyObject_Call (tstate=0x2517b90, callable=<method at remote 0x7f0a0b86abd0>, args=(), kwargs=0x0) at ../Objects/call.c:328
#29 0x00000000004a9b6f in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:355
#30 0x00000000006a117f in thread_run (boot_raw=boot_raw@entry=0x7f0a0b9a20c0) at ../Modules/_threadmodule.c:1092
#31 0x00000000005db0e4 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:246
#32 0x00007f0a0db4a1c4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#33 0x00007f0a0dbca85c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

In all cases we've seen there are two threads running at any one time; one is responsible for the call to PyEval_RestoreThread and the other looks to be a GC thread. Since most (all?) of the core dumps appear to involve the accesses to members of various structures in ceval_gil.h I couldn't help but wonder if GC was somehow stomping on thread information prematurely. But that's entirely speculation on my part; I've spent very little time with the interpreter code base so the GC op could be completely unrelated.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

@absurdfarce absurdfarce added the type-bug An unexpected behavior, bug, or error label Dec 12, 2024
@colesbury
Copy link
Contributor

The GC is a red herring. The problem is that Python shutdown while daemon threads are running isn't safe. (GC is called during the shutdown process, but the GC itself isn't the problem). The relevant line in the stack trace is:

Py_FinalizeEx () at ../Python/pylifecycle.c:1830

Basically, what's happening is that the main thread, as part of the shutdown process, deletes the PyThreadState structures from underneath every other remaining (daemon) thread. There's a check in PyEval_RestoreThread that determines if Python is shutting down, but if you are unlucky that check happens before shutdown starts, but the access to tstate->interp happens after shutdown starts (and tstate is deleted).

See also:

@absurdfarce
Copy link
Author

Thanks for the quick response @colesbury! Your description was very clear (and it makes a great deal of sense).

I read through some of the linked tickets referenced in #124878... seems like this issue (or some variant of it) has been around for a while. Is there a particular range of cpython versions you'd expect to be subject to this issue (or, inversely, is there a set of releases which you believe would not have this problem)? Based on the references here is seems like the answer is "no, something like this has existed for a while" but it seemed worth confirming.

@colesbury
Copy link
Contributor

Is there a particular range of cpython versions you'd expect to be subject to this issue

This has been around since at least Python 3.9. Older Python versions may have different shutdown related crashes.

or, inversely, is there a set of releases which you believe would not have this problem?

I'd like to see it fixed in the upcoming 3.14 release.

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 13, 2024
@absurdfarce
Copy link
Author

Thanks again @colesbury!

I'll very much leave this up to you and the other core Python folks as to whether this ticket stays open. To my eyes this seems to describe a condition that's already pretty well-documented without contributing a whole lot of new context. If you agree I'm perfectly fine with closing this ticket out with the understanding that this will be fixed when something is put in place for the other issues. But if this offers something new I have no objection to leaving it open. It's entirely up to you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants