Boost Python Loops: Parallelism, Generators, and Profiling Made Easy
This guide shows how to accelerate slow Python for‑loops by leveraging multi‑core parallelism, memory‑efficient generators, and a suite of profiling tools, providing step‑by‑step code examples and practical tips to identify and fix performance bottlenecks.
Ever stared at a painfully slow for loop and thought, “Python, can you go faster?” You’re right—Python can be faster, but you need to give it a hand.
This article explains how to make your for loops fly: use multiple CPU cores for parallel processing, employ generators to save memory, and finally use several profiling tools to pinpoint the real bottlenecks.
Part 1: From Slow Loops to Fast CPU
Assume you have a list of numbers and want to apply a compute‑intensive function to each:
import time
def heavy(x):
time.sleep(0.5) # simulate a heavy operation
return x * x
nums = list(range(10))
results = [heavy(x) for x in nums]This takes about 5 seconds. You can do better.
Solution: Parallelize with multiprocessing , map , starmap and joblib
Step 1: Use multiprocessing.Pool
from multiprocessing import Pool
with Pool() as pool:
results = pool.map(heavy, nums)The pool automatically uses as many CPU cores as possible.
Save the script (e.g., test.py) and run it: python test.py Now it finishes in roughly 0.5 seconds.
In Jupyter or IPython you can use joblib:
from joblib import Parallel, delayed
results = Parallel(n_jobs=-1)(delayed(heavy)(x) for x in nums)
print(results)Step 2: When your function needs multiple arguments – use starmap
from multiprocessing import Pool
def heavy_multi(x, y):
time.sleep(0.5)
return x * y
if __name__ == "__main__":
xy = [(x, x+1) for x in range(10)]
with Pool() as pool:
results = pool.starmap(heavy_multi, xy)
print(results)Step 3: For more control, use joblib.Parallel
from joblib import Parallel, delayed
results = Parallel(n_jobs=-1)(delayed(heavy_multi)(x, y) for x, y in xy)
print(results)Joblib offers flexible load‑balancing, progress tracking, and optional caching.
Part 2: Reduce Memory Usage with Generators
After parallelizing, your script may still crash if it tries to hold millions of elements in memory.
Use generators to process items lazily, avoiding large intermediate lists.
1. When a list becomes a problem
nums = list(range(10_000))
squares = [x * x for x in nums]On a typical laptop this can freeze the system.
2. Generator expression – a lazy alternative
nums = range(10_000)
squares = (x * x for x in nums)Values are produced only when requested:
for val in squares:
if val > 1_000_000_000:
breakThis loop uses almost no memory.
3. From return to yield : write your own generator
def compute_all(nums):
for x in nums:
yield x * xNow you can iterate without building a list.
4. Combine generator pipelines
def read_lines(path):
with open(path) as f:
for line in f:
yield line.strip()
def filter_empty(lines):
return (line for line in lines if line)
def to_ints(lines):
return (int(line) for line in lines)
lines = read_lines("bigfile.txt")
cleaned = filter_empty(lines)
numbers = to_ints(cleaned)
total = sum(numbers)This processes a 100 GB file with negligible memory.
5. Extra trick: yield from
def nested():
for x in range(3):
yield from subgen(x)
def subgen(x):
yield x
yield -x yield fromcleanly delegates to another generator.
Part 3: Find and Fix the Real Performance Culprits
Even after parallelization and memory‑saving tricks, a script can still feel sluggish. Measure it.
1. Measure time with time.perf_counter()
import time
start = time.perf_counter()
result = sum(x * x for x in range(10_000_000))
end = time.perf_counter()
print(f"Elapsed {end - start:.2f} seconds")2. Identify slow functions with cProfile
python -m cProfile -s time your_script.pySort output by time to see which functions dominate.
3. Visualize with snakeviz
pip install snakeviz
python -m cProfile -o output.prof your_script.py
snakeviz output.profGenerates an interactive flame graph.
4. Line‑by‑line analysis with line_profiler
pip install line_profiler
@profile
def slow_loop():
total = 0
for i in range(10_000_000):
total += i * i
return total
if __name__ == '__main__':
slow_loop() kernprof -l -v test_line_profiler.pyShows execution time for each line.
5. Non‑intrusive profiling with pyinstrument
pip install pyinstrument
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
slow_function()
profiler.stop()
print(profiler.output_text(unicode=True, color=True))Or run from the command line: pyinstrument your_script.py.
6. Memory profiling with memory_profiler
pip install memory-profiler
from memory_profiler import profile
@profile
def compute_with_list():
nums = list(range(10_000))
squares = [x * x for x in nums]
return sum(squares)
if __name__ == '__main__':
compute_with_list() python -m memory_profiler test_memory_profile.pyShows line‑wise memory consumption.
7. Advanced tools: py‑spy , scalene , viztracer
py‑spy – sampling profiler that can attach to a running process without code changes.
pip install py-spy
py-spy top -- python your_script.py
py-spy record -o profile.svg -- python your_script.pyscalene – reports per‑line CPU time, memory allocations, and copy activity.
pip install scalene
scalene your_script.pyviztracer – generates a detailed JSON trace and opens an interactive Chrome‑style viewer.
pip install viztracer
viztracer your_script.py★ It can even profile a live process: py-spy top --pid 1234
Performance‑analysis‑tool summary
Key takeaways: use parallelism when appropriate, stream data with generators to keep memory low, and always measure with profiling tools before guessing where to optimize.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
