Big Data 9 min read

Using ipyparallel for Parallel and Distributed Computing in Python

This article explains how to overcome Python's Global Interpreter Lock by installing ipyparallel, configuring parallel profiles, and using engines, DirectView, and LoadBalancedView to run both synchronous and asynchronous tasks, with code examples and performance comparisons.

Python Programming Learning Circle

Jan 27, 2022

Using ipyparallel for Parallel and Distributed Computing in Python

Python's Global Interpreter Lock (GIL) prevents true concurrent execution, so the ipyparallel package is introduced to create multiple engines and views ( direct_view and balanced_view) that can run tasks in parallel and distribute work across a cluster.

Installation : Install the package with pip install ipyparallel and create a parallel profile using ipython profile create --parallel --profile=myprofile. This sets up the environment needed for parallel execution.

Simple word‑count example : After downloading a text file with wget http://www.gutenberg.org/files/27287/27287-0.txt, a non‑parallel word‑count function is defined, followed by a parallel version that splits the file, pushes functions to each engine, and aggregates the results. Timing with %time shows the parallel run takes about 10 ms (CPU) versus 4 ms non‑parallel, illustrating both speed gains and the overhead of result aggregation.

# non‑parallel version
import re, io
from collections import defaultdict
non_word = re.compile(r'[\W\d]+', re.UNICODE)
common_words = {...}

def yield_words(filename):
    with io.open(filename, encoding='latin-1') as f:
        for line in f:
            for word in line.split():
                word = non_word.sub('', word.lower())
                if word and word not in common_words:
                    yield word

def word_count(filename):
    word_iterator = yield_words(filename)
    counts = defaultdict(int)
    for word in word_iterator:
        counts[word] += 1
    return counts

IPython magic and Client usage : By importing Client from IPython.parallel, the user can list engine IDs, enable %autopx for automatic parallel execution, and run commands on specific engines with %px. Examples include synchronous mapping ( view.map_sync) and asynchronous mapping ( view.map), with timing output showing the overhead of parallelism for small tasks.

# start two engines
[escape@localhost ~]$ ipcluster start -n 2

# import and create client
from IPython.parallel import Client
rc = Client()
rc.ids  # [0, 1]

# synchronous map
v = rc[:]
result = v.map_sync(lambda x: x**2, range(10))

# asynchronous map
r = v.map(lambda x: x**2, range(10))
print(r.get())

DirectView and LoadBalancedView : Using rc[:] creates a DirectView that runs the same code on all engines, while rc.load_balanced_view() creates a LoadBalancedView that distributes tasks based on engine load. Benchmarks show that for trivial computations the single‑process version is faster, but load‑balanced view can improve distribution for larger workloads.

# direct view map_sync
dview = rc[:]
%time dview.map_sync(lambda x: x**2, range(32))

# load‑balanced view map_sync
lview = rc.load_balanced_view()
%time lview.map_sync(lambda x: x**2, range(32))

Overall, the article demonstrates that while parallel execution with ipyparallel can halve CPU time for suitable tasks, the additional overhead of result aggregation and task scheduling means that for very small workloads a single‑process approach may still be more efficient.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python parallel computing Distributed Computing ipyparallel

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.