How a Four‑Year Hunt Fixed a Hidden Python GIL Race Condition
The article recounts a four‑year investigation that uncovered and repaired a subtle race‑condition bug in Python's Global Interpreter Lock, detailing the bug's origin, the implemented fixes, performance testing, and the decision to make the GIL creation unconditional in Python 3.7.
Fatal error caused by C thread and GIL
In March 2014, a bug (bpo‑20891) was reported where calling PyGILState_Ensure() from a non‑Python thread without first invoking PyEval_InitThreads() caused a fatal crash: take_gil: NULL tstate .
Fixing PyGILState_Ensure()
After reproducing the issue on Linux in 2016, the author wrote a fix for PyGILState_Ensure() and added a unit test test_embed.test_bpo20891(). The fix was merged into Python 2.7, 3.6 and the master branch.
bpo-20891: Fix PyGILState_Ensure() (#4650)
When PyGILState_Ensure() is called in a non‑Python thread before
PyEval_InitThreads(), only call PyEval_InitThreads() after
PyThreadState_New() to fix a crash.
Add an unit test in test_embed.Random crashes on macOS
Running the new test on macOS buildbots revealed a race condition in GIL creation, leading to crashes such as:
Fatal Python error: PyEval_SaveThread: NULL tstate
Current thread 0x00007fffa5dff3c0 (most recent call first):
Abort: 6The author attempted a partial fix by calling PyEval_InitThreads() inside PyThread_start_new_thread(), but this does not solve the problem for threads not created by Python.
Why not always create the GIL?
Why not call PyEval_InitThreads() at interpreter initialization? Any downside?
Guido van Rossum explained that the original design avoided always creating the GIL to reduce overhead for programs that never spawn additional Python threads.
Second fix: always create the GIL
The author proposed a second patch that calls PyEval_InitThreads() unconditionally during Py_Initialize(), ensuring the GIL is always present and eliminating the race condition.
/* Create the GIL */
PyEval_InitThreads();Performance benchmarks using pyperformance showed a modest slowdown (around 5 % on some tests) but no significant impact overall.
Backport decisions
The second fix was not back‑ported to Python 2.7 or 3.6 to avoid potential regressions; only the first PyGILState_Ensure() fix was applied to those versions.
Conclusion
After extensive debugging and testing, the Python core team agreed to make the GIL creation unconditional in Python 3.7, with negligible performance impact, while retaining the on‑demand behavior for older versions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
