Build Distributed Python Processes with Multiprocessing Managers

This article explains why processes are preferred over threads for stability and multi‑machine distribution, introduces Python's multiprocessing.managers for networked task queues, and provides step‑by‑step code examples to create a distributed crawler that fetches image URLs and downloads them across several machines.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Build Distributed Python Processes with Multiprocessing Managers

Introduction

When choosing between threads and processes, processes are generally more stable and can be distributed across multiple machines, whereas threads are limited to the CPUs of a single machine.

Python's multiprocessing module not only supports multi‑process execution, but its managers submodule also allows processes to be spread over several machines. A service process can act as a scheduler, distributing tasks to worker processes via network communication.

Case Study

In a web‑crawling scenario, one process collects image URLs and puts them into a queue, while other processes retrieve URLs from the queue to download and store the images locally.

To implement this in a distributed fashion, the queue must be exposed over the network so that processes on different machines can access it. The distributed‑process wrapper essentially network‑ifies a local queue.

Implementation Example

Server (1.py) that creates and registers the shared queues:

from multiprocessing.managers import BaseManager
from multiprocessing import freeze_support, Queue
# Number of tasks
task_number = 10
# Task and result queues
task_queue = Queue(task_number)
result_queue = Queue(task_number)

def get_task():
    return task_queue

def get_result():
    return result_queue

class QueueManager(BaseManager):
    pass

def win_run():
    QueueManager.register('get_task_queue', callable=get_task)
    QueueManager.register('get_result_queue', callable=get_result)
    manager = QueueManager(address=('127.0.0.1', 8001), authkey='qiye'.encode())
    manager.start()
    try:
        task = manager.get_task_queue()
        result = manager.get_result_queue()
        for url in ["ImageUrl_" + str(i) for i in range(10)]:
            print('url is %s' % url)
            task.put(url)
        print('try get result')
        for i in range(10):
            print('result is %s' % result.get(timeout=10))
    except Exception:
        print('Manager error')
    finally:
        manager.shutdown()

if __name__ == '__main__':
    freeze_support()
    win_run()

Worker (2.py) that connects to the server and processes tasks:

#coding:utf-8
import time
from multiprocessing.managers import BaseManager

class Manager(BaseManager):
    pass

Manager.register('get_task_queue')
Manager.register('get_result_queue')

server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
# Port and authkey must match the server settings
m = Manager(address=(server_addr, 8001), authkey='qiye')
m.connect()

task = m.get_task_queue()
result = m.get_result_queue()

while not task.empty():
    image_url = task.get(True, timeout=5)
    print('run task download %s...' % image_url)
    time.sleep(1)
    result.put('%s--->success' % image_url)
print('worker exit.')

Result

The server receives image URLs from the crawler, passes them to the worker script, which downloads the images and prints the crawl results. Sample console output is shown below:

Conclusion

Based on fundamental Python knowledge, the distributed‑process interface is simple and well‑encapsulated, making it suitable for environments that need to spread heavy tasks across multiple machines. The example demonstrates how a networked queue can be used to pass tasks and collect results efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonprocess managementmultiprocessingnetwork queue
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.