Fundamentals 17 min read

Boost Python Performance: Multithreading, Multiprocessing, and Best Practices

This article explains how Python’s GIL affects concurrency, when to use multithreading versus multiprocessing, and provides practical tips on efficient inter‑process communication, iteration, string handling, sorting, file I/O, and leveraging the standard library to dramatically improve script performance.

Code Mala Tang

Jun 20, 2025

Boost Python Performance: Multithreading, Multiprocessing, and Best Practices

1. Multithreading and Multiprocessing

Python cannot truly parallelize CPU‑bound tasks because of the Global Interpreter Lock (GIL), which allows only one thread to execute Python bytecode at a time. Threads are useful for I/O‑bound work, while separate processes bypass the GIL and can run on multiple cores.

Multithreading : Use when the program is limited by I/O (e.g., downloading files, network requests, disk reads). The GIL is released during I/O, allowing other threads to run.

Multiprocessing : Use for CPU‑bound tasks that benefit from true parallelism. Each process has its own interpreter and memory space, avoiding the GIL.

Proper use can reduce minutes‑long tasks to seconds, but misuse adds overhead without performance gains.

Best Practices

Prefer external libraries (NumPy, SciPy, PyTorch) for heavy numeric work; they release the GIL during C/Fortran/CUDA operations, so explicit multiprocessing is often unnecessary.

Use concurrent.futures.ThreadPoolExecutor for I/O‑bound tasks and concurrent.futures.ProcessPoolExecutor for CPU‑bound tasks.

Limit the number of processes to the number of CPU cores (use os.cpu_count()).

import time, concurrent.futures

def download_data(url):
    print(f"Starting download from {url}...")
    time.sleep(2)
    print(f"Finished download from {url}.")
    return f"Data from {url}"

def calculate_prime(number):
    print(f"Calculating prime for {number}...")
    is_prime = all(number % i for i in range(2, int(number**0.5) + 1))
    print(f"Finished calculation for {number}.")
    return is_prime

urls = ["http://url.com/1", "http://url.com/2", "http://url.com/3"]
print("
--- Using ThreadPoolExecutor (I/O bound)---")
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    results = list(executor.map(download_data, urls))
    print(f"ThreadPool results: {results}")

numbers = [10000003, 10000007, 10000009]
print("
--- Using ProcessPoolExecutor (CPU bound)---")
with concurrent.futures.ProcessPoolExecutor(max_workers=3) as executor:
    results = list(executor.map(calculate_prime, numbers))
    print(f"ProcessPool results: {results}")

2. Efficient Inter‑Process Communication (IPC)

While multiprocessing provides true parallelism, communication between processes can become a bottleneck if large objects are passed or messages are sent frequently.

Passing large objects : Serializing and deserializing big data structures (e.g., large lists, NumPy arrays) can negate the benefits of parallelism.

Frequent small messages : Repeatedly sending tiny chunks adds overhead.

Optimizing these factors is crucial for scalable multiprocessing applications.

Best Practices

Use multiprocessing.Queue or Pipe for small messages.

Use multiprocessing.shared_memory (Python 3.8+) for truly large data shared across processes.

Use multiprocessing.Manager for shared data structures when appropriate.

Batch data before sending through queues or pipes.

import multiprocessing, numpy as np

def process_chunk(chunk_id, data_chunk):
    result = np.sum(data_chunk) * 2
    return f"Chunk {chunk_id} processed, sum doubled: {result}"

if __name__ == '__main__':
    large_array = np.random.rand(1_000_000)
    chunk_size = len(large_array) // 4
    chunks = [large_array[i:i + chunk_size] for i in range(0, len(large_array), chunk_size)]
    print("
--- Using ProcessPoolExecutor for efficient IPC (batch)---")
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.starmap(process_chunk, enumerate(chunks))
        print("All chunks processed.")

3. Loops, Generators, and Efficient Iteration

Python offers highly optimized built‑in functions and iteration patterns. Ignoring them can severely degrade performance.

Best Practices

Direct iteration : Prefer for item in my_list over for i in range(len(my_list)) to avoid repeated indexing.

Use built‑in functions : sum(), min(), max(), any(), all(), zip(), enumerate() are usually faster than manual loops.

Generators for large or infinite data : Use generator expressions or yield to avoid MemoryError and reduce memory usage.

4. Proper String Concatenation

Using + or += inside loops creates many temporary strings because strings are immutable, leading to heavy memory allocation and copying. Instead, use " ".join() for efficient concatenation.

# Not recommended
long_string = ""
for str in list_of_strs:
    long_string += str

# Recommended
long_string = "".join(list_of_strs)

5. Efficient Sorting

Built‑in sorted() and list.sort() use Timsort (or Powersort in Python 3.11+), which is highly optimized for real‑world data. Use the key argument for custom objects.

Best Practices

Sorting custom objects : Use a lambda or operator.attrgetter as the key.

DSU (decorate‑sort‑undecorate) pattern : When the key function is expensive, pre‑compute sort keys.

Partial sorting : Use heapq.nsmallest() or heapq.nlargest() instead of sorting the entire list when only a few top elements are needed.

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return f"Person('{self.name}', {self.age})"

people = [Person('Alice', 30), Person('Bob', 25), Person('Charlie', 35)]
people.sort(key=lambda p: p.age)
print(people)

6. File I/O

Efficient file handling is essential for large data files.

Best Practices

Always use with open() as f to ensure proper closure.

Read small files with f.read(); for large files, iterate line‑by‑line or use generators.

Batch writes by collecting data in a list and writing once with f.write(''.join(lines)) or writing in large blocks.

7. Leveraging Standard Library Modules

The collections and itertools modules provide highly optimized containers and iteration utilities that often outperform generic types.

Best Practices

collections.deque

for O(1) appends/pops at both ends. collections.Counter for fast frequency counting. collections.defaultdict to avoid KeyError and simplify code. itertools.chain, cycle, permutations, combinations, groupby, islice for efficient iteration patterns.

from itertools import chain, cycle, permutations
list1 = [1, 2, 3]
list2 = [4, 5, 6]
for item in chain(list1, list2):
    print(item, end=' ')

colors = cycle(['red', 'green', 'blue'])
for _ in range(5):
    print(next(colors))

for p in permutations('ABC', 2):
    print(''.join(p))

8. Choosing the Right Data Structure

Selecting appropriate structures dramatically reduces algorithmic complexity and runtime.

Best Practices

Use set for fast membership tests and duplicate removal.

Use dict for key‑value lookups.

Use np.array() for large numeric datasets.

9. Avoid Dot‑Lookup Overhead

Calling module functions via the module name incurs an extra attribute lookup. Import the function directly to eliminate this overhead.

# Not recommended
import math
a = math.sqrt(50)

# Recommended
from math import sqrt
a = sqrt(50)

10. Avoid Global Variables

Accessing globals follows the LEGB rule and is slower than local variable access. In tight loops or frequently called functions, prefer locals to reduce lookup cost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance multithreading Data Structures best-practices multiprocessing

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.