Fundamentals 18 min read

When to Use Processes, Threads, or Coroutines in Python? A Practical Guide

This article explains the operating‑system concepts of processes, threads, and coroutines, compares their performance with Python code examples, discusses the impact of the GIL and DMA, and provides clear guidelines for choosing the right concurrency model based on CPU‑bound, I/O‑bound, or mixed workloads.

Liangxu Linux

Aug 13, 2020

When to Use Processes, Threads, or Coroutines in Python? A Practical Guide

What Is a Process?

A process is an OS‑provided abstraction that serves as the basic unit for resource allocation and scheduling; it is the concrete execution of a program. The OS loads the program’s code and static data into memory, creates a stack, allocates heap space, and then transfers CPU control to the new process.

Process control is represented by a Process Control Block (PCB) that stores identifiers, state, priority, file pointers, register contents, etc. A process typically goes through five states: initial , running , ready , waiting (blocked) , and terminated .

Initial: created but not yet scheduled.

Running: currently executing on the CPU (only one process can be in this state on a single core).

Ready: prepared to run once scheduled.

Waiting: blocked on an event.

Terminated: finished execution.

Process Switching

Both single‑core and multi‑core CPUs appear to run multiple processes concurrently by performing context switches: the OS saves the current process’s context, restores the next process’s context, and transfers CPU control.

Process Data Sharing

Each process receives its own virtual address space, an abstraction provided by virtual memory (VM). VM gives three main benefits: it caches active memory pages on RAM while swapping others to disk, presents a consistent address space to each process, and protects each process’s memory from accidental overwrites.

Without inter‑process communication mechanisms, processes cannot directly share data because they operate in isolated virtual address spaces.

Python Multiprocessing Example

import multiprocessing
import threading
import time

n = 0

def count(num):
    global n
    for i in range(100000):
        n += i
    print("Process {0}:n={1},id(n)={2}".format(num, n, id(n)))

if __name__ == '__main__':
    start_time = time.time()
    process = []
    for i in range(5):
        p = multiprocessing.Process(target=count, args=(i,))
        process.append(p)
    for p in process:
        p.start()
    for p in process:
        p.join()
    print("Main:n={0},id(n)={1}".format(n, id(n)))
    end_time = time.time()
    print("Total time:{0}".format(end_time - start_time))

Output shows each child process has its own copy of n (different addresses), while the main process sees n = 0 because the variable is not shared across processes.

Process 1:n=4999950000,id(n)=139854202072440
Process 0:n=4999950000,id(n)=139854329146064
Process 2:n=4999950000,id(n)=139854202072400
Process 4:n=4999950000,id(n)=139854201618960
Process 3:n=4999950000,id(n)=139854202069320
Main:n=0,id(n)=9462720
Total time:0.03138256072998047

What Is a Thread?

A thread is the smallest unit of execution scheduled by the CPU. Threads share the parent process’s virtual address space, file descriptors, and signal handlers, but each thread has its own stack and thread‑local storage.

Thread management information is stored in a Thread Control Block (TCB) containing fields such as thread identifier, register set, execution state, priority, thread‑local storage, and signal mask.

Threads also have the same five states as processes: initial, running, waiting, ready, and terminated.

Process vs. Thread

Process: independent resource allocation and scheduling unit with its own virtual address space.

Thread: CPU‑scheduling unit that shares the process’s address space.

Threads are lighter weight; creation and destruction are faster than processes.

Because threads share memory, synchronization (mutexes, semaphores) is required.

A thread crash can bring down the whole process, whereas a process crash does not affect other processes.

In Python, multithreading is considered “pseudo‑multithreading” because of the Global Interpreter Lock (GIL).

What Is a Coroutine?

A coroutine (also called a micro‑thread) is a user‑level, lightweight execution unit that is scheduled by the program itself rather than the OS kernel. Coroutines run within a single OS thread, can be paused and resumed without system calls, and avoid the overhead of thread creation and context switching.

They behave like sub‑routines that can yield control voluntarily.

Because they never run in parallel on multiple CPU cores, they avoid the GIL contention.

No shared‑memory race conditions arise, so synchronization primitives are unnecessary.

Choosing Between Process, Thread, and Coroutine

Python’s Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time within a process. The GIL was introduced to protect reference‑count‑based memory management. Consequently, even on multi‑core CPUs, multithreaded Python code does not achieve true parallelism for CPU‑bound tasks.

Therefore, the naïve belief that “just use multithreading for concurrency in Python” is incorrect.

When to Use Which Model?

CPU‑intensive workloads: Use multiple processes (multiprocessing) to bypass the GIL and achieve true parallelism.

I/O‑intensive workloads: Threads can overlap I/O waits, but because of the GIL only one thread runs Python code at a time; nevertheless, the I/O wait time is released, so multithreading often outperforms multiprocessing for pure I/O.

Mixed CPU + I/O workloads: Combine multiprocessing (to utilize multiple cores) with coroutines/asyncio inside each process to handle many concurrent I/O operations efficiently.

Performance Tests

CPU‑bound test (no sleep):

Process 0:n=5756690257,id(n)=140103573185600
Process 2:n=10819616173,id(n)=140103573185600
Process 1:n=11829507727,id(n)=140103573185600
Process 4:n=17812587459,id(n)=140103573072912
Process 3:n=14424763612,id(n)=140103573185600
Main:n=17812587459,id(n)=140103573072912
Total time:0.1056210994720459

Result shows the global variable n is shared among threads, confirming data sharing.

I/O‑bound test (1‑second sleep per task):

## Multiprocessing
Process 0 End
Process 3 End
Process 4 End
Process 2 End
Process 1 End
Total time:1.383193016052246
## Multithreading
Process 0 End
Process 4 End
Process 3 End
Process 1 End
Process 2 End
Total time:1.003425121307373

Multithreading finishes faster because threads overlap the I/O wait.

Asyncio coroutine test:

total time: 1.001854419708252

Coroutines achieve comparable or better performance than threads for I/O‑bound scenarios, with even lower overhead.

Conclusion

The article combines OS theory with Python code to clarify the differences between processes, threads, and coroutines, and provides practical guidance:

CPU‑bound: prefer multiprocessing.

I/O‑bound: prefer multithreading (or coroutines for higher concurrency).

Mixed workloads: combine multiprocessing with coroutines.

Understanding the GIL, context switching, and virtual memory helps developers select the most efficient concurrency model for their specific workload.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance concurrency Thread process coroutine

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.