Comprehensive Guide to Python Multiprocessing: Basics, IPC, Process Pools, and Best Practices
This article provides an in‑depth overview of Python’s multiprocessing module, covering its fundamentals, process creation, inter‑process communication methods such as Queue, Pipe, shared memory, process pools, synchronization techniques, and practical best‑practice guidelines for effective parallel programming.
Python's Global Interpreter Lock (GIL) limits the parallel execution ability of multithreaded programs, while multiprocessing offers an effective way to bypass the GIL and achieve true parallel computation. This article delves into Python multiprocessing techniques to help you fully leverage multi‑core CPU power.
1. Basics of Multiprocessing
Why Choose Multiprocessing?
Each process has its own Python interpreter and memory space.
Completely avoids the GIL, enabling true parallel computation.
Suitable for CPU‑bound tasks.
Processes are isolated; a crash in one does not affect others.
multiprocessing Module Overview
The standard library's multiprocessing module provides tools for creating and managing processes, with an API design similar to the threading module, reducing the learning curve.
import multiprocessing
def worker():
"""Task executed by the child process"""
print(f"Child process {multiprocessing.current_process().name} is running")
if __name__ == '__main__':
processes = []
for i in range(4):
p = multiprocessing.Process(target=worker, name=f"worker-{i}")
processes.append(p)
p.start()
for p in processes:
p.join() # Wait for all child processes to finish2. Inter‑Process Communication (IPC)
Although multiprocessing offers strong isolation, it also introduces challenges for inter‑process communication. The multiprocessing module provides several IPC mechanisms.
Queue
from multiprocessing import Process, Queue
def producer(q):
for i in range(5):
q.put(i)
print(f"Produced {i}")
def consumer(q):
while True:
item = q.get()
if item is None: # Termination signal
break
print(f"Consumed {item}")
if __name__ == '__main__':
q = Queue()
procs = [
Process(target=producer, args=(q,)),
Process(target=consumer, args=(q,))
]
for p in procs:
p.start()
procs[0].join() # Wait for producer to finish
q.put(None) # Send termination signal
procs[1].join()Pipe
from multiprocessing import Process, Pipe
def sender(conn):
conn.send("Hello from sender!")
conn.close()
def receiver(conn):
msg = conn.recv()
print(f"Received message: {msg}")
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p1 = Process(target=sender, args=(child_conn,))
p2 = Process(target=receiver, args=(parent_conn,))
p1.start()
p2.start()
p1.join()
p2.join()Shared Memory
from multiprocessing import Process, Value, Array
def increment(n, arr):
n.value += 1
for i in range(len(arr)):
arr[i] *= 2
if __name__ == '__main__':
num = Value('i', 0) # 'i' denotes signed integer
arr = Array('d', [1.0, 2.0, 3.0]) # 'd' denotes double precision float
p = Process(target=increment, args=(num, arr))
p.start()
p.join()
print(num.value) # Output: 1
print(arr[:]) # Output: [2.0, 4.0, 6.0]3. Process Pool
For a large number of tasks, using a process pool avoids the overhead of repeatedly creating and destroying processes.
from multiprocessing import Pool
import time
def square(x):
time.sleep(1) # Simulate a time‑consuming operation
return x * x
if __name__ == '__main__':
with Pool(processes=4) as pool: # Create a pool with 4 worker processes
# map method
results = pool.map(square, range(10))
print(results) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# apply_async method (asynchronous)
result = pool.apply_async(square, (20,))
print(result.get(timeout=2)) # 400
# imap method (lazy evaluation)
for res in pool.imap(square, range(5)):
print(res)4. Process Synchronization
Although processes do not share memory like threads, synchronization may still be required when accessing shared resources.
Lock
from multiprocessing import Process, Lock
def printer(item, lock):
with lock:
print(f"Process {multiprocessing.current_process().name} prints: {item}")
if __name__ == '__main__':
lock = Lock()
items = ['A', 'B', 'C', 'D']
processes = []
for item in items:
p = Process(target=printer, args=(item, lock))
processes.append(p)
p.start()
for p in processes:
p.join()5. Sharing State Between Processes
Manager Object
Manager provides the ability to share complex data structures between processes.
from multiprocessing import Process, Manager
def worker(d, l):
d[multiprocessing.current_process().name] = 'value'
l.append(multiprocessing.current_process().pid)
if __name__ == '__main__':
with Manager() as manager:
shared_dict = manager.dict()
shared_list = manager.list()
processes = []
for i in range(4):
p = Process(target=worker, args=(shared_dict, shared_list))
processes.append(p)
p.start()
for p in processes:
p.join()
print(shared_dict)
print(shared_list)6. Multiprocessing Best Practices
Avoid shared state: design a stateless architecture whenever possible to reduce inter‑process communication.
Set an appropriate number of processes, typically equal to the number of CPU cores or slightly more.
Be aware of resource consumption: each Python process has its own memory space, and many processes can consume significant memory.
Handle zombie processes: ensure proper calls to join() or set the daemon attribute.
Use a process pool: for many short‑lived tasks, prefer using Pool rather than creating individual processes.
Mind Windows compatibility: multiprocessing works differently on Windows; ensure the if __name__ == '__main__': guard.
7. Multiprocessing vs Multithreading Decision Guide
Feature
Multiprocessing
Multithreading
GIL Impact
None
Present
Memory Isolation
Fully isolated
Shared
Creation Overhead
High
Low
Communication Cost
High
Low
Suitable Scenarios
CPU‑bound
I/O‑bound
Data Sharing
Requires IPC
Directly shared
Stability
One crash does not affect others
One crash may affect the whole program
Conclusion
Multiprocessing is a crucial technique for achieving true parallel computation in Python, especially for CPU‑bound tasks. Using the multiprocessing module, we can easily create and manage processes for efficient parallel processing. Mastering IPC and synchronization, and using process pools wisely, can significantly improve program performance.
In practice, choose between multiprocessing and multithreading based on task characteristics; sometimes a hybrid approach (multiple processes each with multiple threads) yields the best performance. Remember, concurrent programming is about balancing performance gains against added complexity.
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.