Build Distributed Python Processes with Multiprocessing Managers
This article explains why processes are preferred over threads for stability and multi‑machine distribution, introduces Python's multiprocessing.managers for networked task queues, and provides step‑by‑step code examples to create a distributed crawler that fetches image URLs and downloads them across several machines.
Introduction
When choosing between threads and processes, processes are generally more stable and can be distributed across multiple machines, whereas threads are limited to the CPUs of a single machine.
Python's multiprocessing module not only supports multi‑process execution, but its managers submodule also allows processes to be spread over several machines. A service process can act as a scheduler, distributing tasks to worker processes via network communication.
Case Study
In a web‑crawling scenario, one process collects image URLs and puts them into a queue, while other processes retrieve URLs from the queue to download and store the images locally.
To implement this in a distributed fashion, the queue must be exposed over the network so that processes on different machines can access it. The distributed‑process wrapper essentially network‑ifies a local queue.
Implementation Example
Server (1.py) that creates and registers the shared queues:
from multiprocessing.managers import BaseManager
from multiprocessing import freeze_support, Queue
# Number of tasks
task_number = 10
# Task and result queues
task_queue = Queue(task_number)
result_queue = Queue(task_number)
def get_task():
return task_queue
def get_result():
return result_queue
class QueueManager(BaseManager):
pass
def win_run():
QueueManager.register('get_task_queue', callable=get_task)
QueueManager.register('get_result_queue', callable=get_result)
manager = QueueManager(address=('127.0.0.1', 8001), authkey='qiye'.encode())
manager.start()
try:
task = manager.get_task_queue()
result = manager.get_result_queue()
for url in ["ImageUrl_" + str(i) for i in range(10)]:
print('url is %s' % url)
task.put(url)
print('try get result')
for i in range(10):
print('result is %s' % result.get(timeout=10))
except Exception:
print('Manager error')
finally:
manager.shutdown()
if __name__ == '__main__':
freeze_support()
win_run()Worker (2.py) that connects to the server and processes tasks:
#coding:utf-8
import time
from multiprocessing.managers import BaseManager
class Manager(BaseManager):
pass
Manager.register('get_task_queue')
Manager.register('get_result_queue')
server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
# Port and authkey must match the server settings
m = Manager(address=(server_addr, 8001), authkey='qiye')
m.connect()
task = m.get_task_queue()
result = m.get_result_queue()
while not task.empty():
image_url = task.get(True, timeout=5)
print('run task download %s...' % image_url)
time.sleep(1)
result.put('%s--->success' % image_url)
print('worker exit.')Result
The server receives image URLs from the crawler, passes them to the worker script, which downloads the images and prints the crawl results. Sample console output is shown below:
Conclusion
Based on fundamental Python knowledge, the distributed‑process interface is simple and well‑encapsulated, making it suitable for environments that need to spread heavy tasks across multiple machines. The example demonstrates how a networked queue can be used to pass tasks and collect results efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
