Fundamentals 6 min read

Why Python Threads Lag Behind Single Threading and How Locks Fix It

This article explains why Python's OS‑level threads can be slower for CPU‑bound tasks due to the Global Interpreter Lock, describes race conditions caused by reference counting, and shows how using threading locks (including context‑manager syntax) ensures correct concurrent execution.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Why Python Threads Lag Behind Single Threading and How Locks Fix It

Python threads are wrappers around OS threads (Pthreads on Linux, Windows threads on Windows), fully managed by the operating system, but in CPU‑bound tasks multithreading can be slower than single threading.

GIL in CPython

In CPython each thread acquires the Global Interpreter Lock (GIL) before executing bytecode and releases it after a short period, preventing other threads from running simultaneously. This design avoids race conditions in CPython’s memory management, which relies on reference counting.

When two threads modify the same reference‑counted object concurrently, a race condition can occur, leading to memory leaks or crashes.

Bypassing the GIL

High‑performance libraries such as NumPy implement critical code in C/C++ to sidestep the GIL, and developers can also write C extensions for performance‑critical sections.

Using Locks

Even with the GIL, explicit locks are needed to protect shared resources. The typical pattern uses threading.Lock() with acquire() and release() , or the context‑manager form with lock: .

<code>mutex = threading.Lock()  # create lock
mutex.acquire()          # lock
# critical section
mutex.release()          # unlock</code>

Example without a lock shows nondeterministic results when incrementing a global counter from two threads, producing values far from the expected 2,000,000.

<code>g_count = 0
def func(str_val):
    global g_count
    for i in range(1000000):
        g_count += 1
    print(f"{str_val}:g_count={g_count}")</code>

Adding a lock around the increment yields the correct final count:

<code>g_count = 0
lock = threading.Lock()
def func(str_val):
    global g_count
    for i in range(1000000):
        lock.acquire()
        g_count += 1
        lock.release()
    print(f"{str_val}:g_count={g_count}")</code>

Using the with lock: syntax simplifies the pattern:

<code>with lock:
    # critical section
    print('Critical section 1')
    print('Critical section 2')</code>
GIL illustration
GIL illustration
ConcurrencyLockthreadingRace ConditionGIL
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.