Fundamentals 13 min read

Master Python Multiprocessing: From Basics to Advanced Process Management

This article explains Python's multiprocessing module, covering process concepts, creation of single and multiple processes, process pools, locks, inter‑process communication methods such as Event, Pipe, Queue, semaphores, and data sharing techniques, with code examples and visual illustrations.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master Python Multiprocessing: From Basics to Advanced Process Management

Preface

A process is an operating‑system entity that runs a program; a system may have one or many processes, allocated based on CPU cores.

Below are screenshots of the Windows Task Manager showing many processes created by the 360 browser.

The Resource Monitor also displays detailed process and thread usage.

1. Basic Usage

Processes execute programs and can contain multiple threads. Creating too many processes can waste resources unless building a large system.

1.1 Create Process

1. Import the module

import multiprocessing as m

The above import is incorrect for creating processes; the correct import is:

from multiprocessing import Process
Process(group, target, args, kwargs, name)

Parameters:

group: user group
target: function to run
args: argument tuple
kwargs: argument dict
name: child process name

Common utility methods:

# List alive child processes (may wait for termination)
multiprocessing.active_children()
# Number of CPU cores
multiprocessing.cpu_count()

2. Create a single process

Key methods:

# Start process (calls run())
start()
# Run method
run()
# Force termination (no cleanup)
terminate()
# Check if alive
is_alive()
# Wait for termination (join)
join([timeout])
# Set as daemon (must before start)
daemon
# Process name
name
# Process ID (available after start)
pid
# Exit code (None if not terminated)
exitcode
# Authentication key
authkey
# Sentinel handle
sentinel
# Kill process
kill()
# Close process
close()

Always guard process creation with:

if __name__ == '__main__':

Processes can also be created by subclassing Process.

3. Create multiple processes

Use a loop to start several processes, improving speed.

4. Process Pool

Pool simplifies resource management by reusing a fixed number of worker processes.

from multiprocessing import Pool
import multiprocessing as m
num = m.cpu_count()
pool = multiprocessing.Pool(num)

Common pool methods:

apply(func, args, kwargs)          # Synchronous (blocking)
apply_async(func, args, kwargs)   # Asynchronous (non‑blocking)
terminate()                        # Force stop, discard pending tasks
join()                             # Wait for workers to exit (after close/terminate)
close()                            # Prevent new tasks, wait for completion
map(func, iterable, chunksize=int) # Parallel map, blocks until results
map_async(func, iterable, chunksize, callback, error_callback)
imap(func, iterable, chunksize)   # Lazy iterator version
imap_unordered(func, iterable, chunksize)
starmap(func, iterable, chunksize)

For web crawlers, small tasks can use synchronous execution, while large crawls benefit from asynchronous (parallel) execution.

Serial example

Parallel example

5. Locks

Locks synchronize access to shared resources.

from multiprocessing import Lock

Re‑entrant locks ( RLock) allow the same process to acquire the lock multiple times.

import time
lock1 = RLock()
lock2 = RLock()
s = time.time()
def jc(num):
    lock1.acquire()
    lock2.acquire()
    print('start')
    print(m.current_process().pid, 'run----', str(num))
    lock1.release()
    lock2.release()
    print('end')
if __name__ == '__main__':
    aa = []
    for y in range(12):
        pp = Process(target=jc, args=(y,))
        pp.start()
        aa.append(pp)
    for x in aa:
        x.join()
    e = time.time()
    print(e - s)

6. Inter‑process Communication

Event

import time
e = Event()
def main(num):
    while True:
        if num < 5:
            e.clear()   # clear signal
            print('clear')
        if num >= 5:
            e.wait(timeout=1)  # wait for signal
            e.set()
            print('set')
        if num == 10:
            e.wait(timeout=3)
            e.clear()
            print('exit')
            break
        num += 1
        time.sleep(2)
if __name__ == '__main__':
    for y in range(10):
        pp = Process(target=main, args=(y,))
        pp.start()
        pp.join()

Pipe

p1, p2 = m.Pipe(duplex=bool)  # duplex=True for full‑duplex
p1.send(data)   # send
p2.recv()        # receive
p1.close()        # close connection
p1.fileno()      # file descriptor
p1.poll([timeout])  # check if data available
p2.recv_bytes([maxlength])
p1.send_bytes([maxlength])
p2.recv_bytes_into(buffer, [offset])

Queue

def fd(a):
    for y in range(10):
        a.put(y)  # insert
        print('insert:', str(y))

def df(b):
    while True:
        aa = b.get(True)  # remove
        print('release:', str(aa))
if __name__ == '__main__':
    q = Queue()
    ff = Process(target=fd, args=(q,))
    dd = Process(target=df, args=(q,))
    ff.start()
    dd.start()
    dd.terminate()
    ff.join()

7. Semaphore

s = Semaphore(3)
s.acquire()
print(s.get_value())
s.release()
print(s.get_value())
print(s.get_value())
s.release()
print(s.get_value())
s.release()

8. Data Sharing

# Value type
m.Value()
# Array type
m.Array()
# Dict type
m.dict()
# List type
m.list()
# Manager shared objects
Manager().dict()
Manager().list()

Conclusion

The article provides a comprehensive overview of Python processes, demonstrating creation, pooling, synchronization, communication, and shared data techniques, enabling readers to apply multiprocessing effectively in their projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonconcurrencyIPCprocessLocksmultiprocessingProcess Pool
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.