Fundamentals 10 min read

Comprehensive Guide to Python Multiprocessing: Basics, IPC, Process Pools, and Best Practices

This article provides an in‑depth overview of Python’s multiprocessing module, covering its fundamentals, process creation, inter‑process communication methods such as Queue, Pipe, shared memory, process pools, synchronization techniques, and practical best‑practice guidelines for effective parallel programming.

php中文网 Courses
php中文网 Courses
php中文网 Courses
Comprehensive Guide to Python Multiprocessing: Basics, IPC, Process Pools, and Best Practices

Python's Global Interpreter Lock (GIL) limits the parallel execution ability of multithreaded programs, while multiprocessing offers an effective way to bypass the GIL and achieve true parallel computation. This article delves into Python multiprocessing techniques to help you fully leverage multi‑core CPU power.

1. Basics of Multiprocessing

Why Choose Multiprocessing?

Each process has its own Python interpreter and memory space.

Completely avoids the GIL, enabling true parallel computation.

Suitable for CPU‑bound tasks.

Processes are isolated; a crash in one does not affect others.

multiprocessing Module Overview

The standard library's multiprocessing module provides tools for creating and managing processes, with an API design similar to the threading module, reducing the learning curve.

import multiprocessing

def worker():
    """Task executed by the child process"""
    print(f"Child process {multiprocessing.current_process().name} is running")

if __name__ == '__main__':
    processes = []
    for i in range(4):
        p = multiprocessing.Process(target=worker, name=f"worker-{i}")
        processes.append(p)
        p.start()
    for p in processes:
        p.join()  # Wait for all child processes to finish

2. Inter‑Process Communication (IPC)

Although multiprocessing offers strong isolation, it also introduces challenges for inter‑process communication. The multiprocessing module provides several IPC mechanisms.

Queue

from multiprocessing import Process, Queue

def producer(q):
    for i in range(5):
        q.put(i)
        print(f"Produced {i}")

def consumer(q):
    while True:
        item = q.get()
        if item is None:  # Termination signal
            break
        print(f"Consumed {item}")

if __name__ == '__main__':
    q = Queue()
    procs = [
        Process(target=producer, args=(q,)),
        Process(target=consumer, args=(q,))
    ]
    for p in procs:
        p.start()
    procs[0].join()  # Wait for producer to finish
    q.put(None)  # Send termination signal
    procs[1].join()

Pipe

from multiprocessing import Process, Pipe

def sender(conn):
    conn.send("Hello from sender!")
    conn.close()

def receiver(conn):
    msg = conn.recv()
    print(f"Received message: {msg}")
    conn.close()

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p1 = Process(target=sender, args=(child_conn,))
    p2 = Process(target=receiver, args=(parent_conn,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

Shared Memory

from multiprocessing import Process, Value, Array

def increment(n, arr):
    n.value += 1
    for i in range(len(arr)):
        arr[i] *= 2

if __name__ == '__main__':
    num = Value('i', 0)  # 'i' denotes signed integer
    arr = Array('d', [1.0, 2.0, 3.0])  # 'd' denotes double precision float
    p = Process(target=increment, args=(num, arr))
    p.start()
    p.join()
    print(num.value)  # Output: 1
    print(arr[:])     # Output: [2.0, 4.0, 6.0]

3. Process Pool

For a large number of tasks, using a process pool avoids the overhead of repeatedly creating and destroying processes.

from multiprocessing import Pool
import time

def square(x):
    time.sleep(1)  # Simulate a time‑consuming operation
    return x * x

if __name__ == '__main__':
    with Pool(processes=4) as pool:  # Create a pool with 4 worker processes
        # map method
        results = pool.map(square, range(10))
        print(results)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
        # apply_async method (asynchronous)
        result = pool.apply_async(square, (20,))
        print(result.get(timeout=2))  # 400
        # imap method (lazy evaluation)
        for res in pool.imap(square, range(5)):
            print(res)

4. Process Synchronization

Although processes do not share memory like threads, synchronization may still be required when accessing shared resources.

Lock

from multiprocessing import Process, Lock

def printer(item, lock):
    with lock:
        print(f"Process {multiprocessing.current_process().name} prints: {item}")

if __name__ == '__main__':
    lock = Lock()
    items = ['A', 'B', 'C', 'D']
    processes = []
    for item in items:
        p = Process(target=printer, args=(item, lock))
        processes.append(p)
        p.start()
    for p in processes:
        p.join()

5. Sharing State Between Processes

Manager Object

Manager provides the ability to share complex data structures between processes.

from multiprocessing import Process, Manager

def worker(d, l):
    d[multiprocessing.current_process().name] = 'value'
    l.append(multiprocessing.current_process().pid)

if __name__ == '__main__':
    with Manager() as manager:
        shared_dict = manager.dict()
        shared_list = manager.list()
        processes = []
        for i in range(4):
            p = Process(target=worker, args=(shared_dict, shared_list))
            processes.append(p)
            p.start()
        for p in processes:
            p.join()
        print(shared_dict)
        print(shared_list)

6. Multiprocessing Best Practices

Avoid shared state: design a stateless architecture whenever possible to reduce inter‑process communication.

Set an appropriate number of processes, typically equal to the number of CPU cores or slightly more.

Be aware of resource consumption: each Python process has its own memory space, and many processes can consume significant memory.

Handle zombie processes: ensure proper calls to join() or set the daemon attribute.

Use a process pool: for many short‑lived tasks, prefer using Pool rather than creating individual processes.

Mind Windows compatibility: multiprocessing works differently on Windows; ensure the if __name__ == '__main__': guard.

7. Multiprocessing vs Multithreading Decision Guide

Feature

Multiprocessing

Multithreading

GIL Impact

None

Present

Memory Isolation

Fully isolated

Shared

Creation Overhead

High

Low

Communication Cost

High

Low

Suitable Scenarios

CPU‑bound

I/O‑bound

Data Sharing

Requires IPC

Directly shared

Stability

One crash does not affect others

One crash may affect the whole program

Conclusion

Multiprocessing is a crucial technique for achieving true parallel computation in Python, especially for CPU‑bound tasks. Using the multiprocessing module, we can easily create and manage processes for efficient parallel processing. Mastering IPC and synchronization, and using process pools wisely, can significantly improve program performance.

In practice, choose between multiprocessing and multithreading based on task characteristics; sometimes a hybrid approach (multiple processes each with multiple threads) yields the best performance. Remember, concurrent programming is about balancing performance gains against added complexity.

PythonParallel ComputingSynchronizationIPCmultiprocessingprocess pool
php中文网 Courses
Written by

php中文网 Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.