Fundamentals 15 min read

Master Python Multiprocessing: From Fork to Process Pools

This article explains the concepts of processes and threads in Python, compares multi‑process, multi‑thread, and combined approaches, shows how to use fork, the multiprocessing module, process pools, subprocesses, and inter‑process communication with queues, and provides complete code examples with results.

Python Crawling & Data Mining

Oct 21, 2019

Master Python Multiprocessing: From Fork to Process Pools

Recently the author continues the Python advanced series with the first article focusing on processes and threads. The previous article introduced the concurrent.futures module for multi‑process and multi‑thread operations.

Concept

Concurrent programming means running multiple tasks simultaneously. It relies on two core concepts: processes and threads .

For an operating system, a task (or program) is a process. Opening a browser, WeChat, or two Notepad windows each creates a separate process.

Characteristics of a process:

The OS allocates memory and resources per process; each process has its own address space and data stack.

A process can create new processes using fork or spawn.

Processes have independent memory, so they communicate via IPC mechanisms such as pipes, signals, sockets, or shared memory.

Within a process, multiple sub‑tasks are called threads . A process must contain at least one thread.

When implementing concurrency, three solutions exist:

Multi‑process : multiple processes each with a single thread.

Multi‑thread : a single process with multiple threads.

Multi‑process + multi‑thread : multiple processes each spawning multiple threads (rarely used due to complexity).

Note: True parallel execution of multiple tasks requires a multi‑core CPU; on a single‑core CPU, threads share the same CPU time slice.

Python supports both multi‑process and multi‑thread programming.

Multi‑process

On Unix/Linux, the system call fork() creates a child process. The call returns twice: once in the parent (returning the child’s PID) and once in the child (returning 0).

Example using os.fork():

import os
print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
    print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
    print('I (%s) just created a child process (%s).' % (os.getpid(), pid))

Running result:

Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.

Windows does not provide fork, so the multiprocessing module is used instead.

multiprocessing

The multiprocessing module offers a Process class. The following example compares a single‑process download with a multi‑process download.

Without multiprocessing:

def download_task(filename):
    '''模拟下载文件'''
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))

def download_without_multiprocess():
    '''不采用多进程'''
    start = time()
    download_task('Python.pdf')
    download_task('nazha.mkv')
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))

if __name__ == '__main__':
    download_without_multiprocess()

Result shows the total time equals the sum of both tasks (≈18 seconds).

开始下载Python.pdf...
Python.pdf下载完成! 耗费了9秒
开始下载nazha.mkv...
nazha.mkv下载完成! 耗费了9秒
总共耗费了18.00秒.

With multiprocessing:

def download_task(filename):
    '''模拟下载文件'''
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))

def download_multiprocess():
    '''采用多进程'''
    start = time()
    p1 = Process(target=download_task, args=('Python.pdf',))
    p1.start()
    p2 = Process(target=download_task, args=('nazha.mkv',))
    p2.start()
    p1.join()
    p2.join()
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))

if __name__ == '__main__':
    download_multiprocess()

Result demonstrates a reduced total time (≈9.36 seconds).

开始下载Python.pdf...
开始下载nazha.mkv...
Python.pdf下载完成! 耗费了5秒
nazha.mkv下载完成! 耗费了9秒
总共耗费了9.36秒.

Pool

When many processes are needed, a process pool simplifies creation and management:

import os
from multiprocessing import Process, Pool
from random import randint
from time import time, sleep

def download_multiprocess_pool():
    '''采用多进程，并用 pool 管理进程池'''
    start = time()
    filenames = ['Python.pdf', 'nazha.mkv', 'something.mp4', 'lena.png', 'lol.avi']
    p = Pool(5)
    for i in range(5):
        p.apply_async(download_task, args=(filenames[i],))
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))

if __name__ == '__main__':
    download_multiprocess_pool()

The Pool creates five processes, apply_async launches tasks, and close prevents new tasks while join waits for completion.

Waiting for all subprocesses done...
开始下载Python.pdf...
开始下载nazha.mkv...
开始下载something.mp4...
开始下载lena.png...
开始下载lol.avi...
nazha.mkv下载完成! 耗费了5秒
lena.png下载完成! 耗费了6秒
something.mp4下载完成! 耗费了7秒
Python.pdf下载完成! 耗费了8秒
lol.avi下载完成! 耗费了9秒
总共耗费了9.80秒.

Subprocess

The subprocess module runs external commands. Example using nslookup:

import subprocess

print('$ nslookup www.python.org')
r = subprocess.call(['nslookup', 'www.python.org'])
print('Exit code:', r)

Result:

$ nslookup www.python.org
Server:        192.168.19.4
Address:    192.168.19.4#53

Non-authoritative answer:
www.python.org    canonical name = python.map.fastly.net.
Name:    python.map.fastly.net
Address: 199.27.79.223

Exit code: 0

When input is required, communicate() can be used:

import subprocess

print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx
python.org
exit
')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)

Result shows MX records for python.org and an exit code of 0.

Inter‑process communication

The multiprocessing module also provides Queue and Pipes for data exchange. The following example uses a Queue:

import os
from multiprocessing import Process, Queue
import random
from time import sleep

def write(q):
    print('Process to write: %s' % os.getpid())
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        sleep(random.random())

def read(q):
    print('Process to read: %s' % os.getpid())
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)

def ipc_queue():
    '''采用 Queue 实现进程间通信'''
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    pw.start()
    pr.start()
    pw.join()
    pr.terminate()

if __name__ == '__main__':
    ipc_queue()

Running output:

Process to write: 24992
Put A to queue...
Process to read: 22836
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.

References are provided at the end of the article. The next article will cover multithreading and how to choose between multi‑process and multi‑thread approaches.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python concurrency thread IPC process Fork Multiprocessing

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.