Simple Techniques to Accelerate Python for‑loops (1.3× to 970× Speedup)
This article presents a collection of practical Python techniques—such as list comprehensions, pre‑computing lengths, using sets, skipping irrelevant iterations, inlining functions, generators, map, memoization, vectorization, filterfalse, and join—that together can boost for‑loop performance anywhere from 1.3‑fold up to nearly a thousand times, with concrete code examples and benchmark results.
In this tutorial we explore a series of straightforward methods that can increase the speed of Python for loops by factors ranging from 1.3× to 970×, using the timeit module to measure baseline and improved performance.
1. List Comprehension
Replacing an explicit loop with a list comprehension halves the execution time.
# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
output = []
for n in numbers:
output.append(n ** 2.5)
return output
# Improved version (Using List Comprehension)
def test_01_v1(numbers):
output = [n ** 2.5 for n in numbers]
return outputResult: 2.00× speedup (32.158 ns → 16.040 ns per loop).
2. Compute Length Outside the Loop
Moving the length calculation out of the loop yields a 1.6× improvement.
# Baseline version (Length calculation inside loop)
def test_02_v0(numbers):
output_list = []
for i in range(len(numbers)):
output_list.append(i * 2)
return output_list
# Improved version (Length calculation outside loop)
def test_02_v1(numbers):
my_list_length = len(numbers)
output_list = []
for i in range(my_list_length):
output_list.append(i * 2)
return output_listResult: 1.64× speedup (112.135 ns → 68.304 ns per loop).
3. Use set for Membership Tests
Replacing nested loops with set intersections accelerates the code by nearly 500×.
# Baseline version (nested loops)
def test_03_v0(list_1, list_2):
common_items = []
for item in list_1:
if item in list_2:
common_items.append(item)
return common_items
# Improved version (using sets)
def test_03_v1(list_1, list_2):
s_1 = set(list_1)
s_2 = set(list_2)
common_items = s_1.intersection(s_2)
return common_itemsResult: 498.17× speedup (9047.078 ns → 18.161 ns per loop).
4. Skip Irrelevant Iterations
Design the loop to avoid unnecessary work, achieving roughly a 2× gain.
# Inefficient version
def function_do_something(numbers):
for n in numbers:
square = n * n
if square % 2 == 0:
return square
return None
# Improved version
def function_do_something_v1(numbers):
even_numbers = [i for i in numbers if i % 2 == 0]
for n in even_numbers:
square = n * n
return square
return NoneResult: 1.94× speedup (16.912 ns → 8.697 ns per loop).
5. Inline Simple Functions
Inlining the body of a frequently‑called function removes call overhead, giving about a 1.35× improvement.
# Baseline (calls is_prime)
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
def test_05_v0(n):
count = 0
for i in range(2, n + 1):
if is_prime(i):
count += 1
return count
# Improved (inlined logic)
def test_05_v1(n):
count = 0
for i in range(2, n + 1):
if i <= 1:
continue
for j in range(2, int(i**0.5) + 1):
if i % j == 0:
break
else:
count += 1
return countResult: 1.35× speedup (1271.188 ns → 939.603 ns per loop).
6. Use Generators
Generators provide lazy evaluation and can dramatically reduce runtime for sequence generation.
# Baseline (list‑based Fibonacci)
def test_08_v0(n):
if n <= 1:
return n
f_list = [0, 1]
for i in range(2, n + 1):
f_list.append(f_list[i - 1] + f_list[i - 2])
return f_list[n]
# Improved (generator‑based)
def test_08_v1(n):
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + bResult: 22.06× speedup (0.083 ns → 0.004 ns per loop).
7. Use map()
Replacing an explicit loop with the built‑in map function can accelerate execution by nearly a thousand times.
def some_function_X(x):
return x**2
def test_09_v0(numbers):
output = []
for i in numbers:
output.append(some_function_X(i))
return output
def test_09_v1(numbers):
output = map(some_function_X, numbers)
return outputResult: 970.69× speedup (4.402 ns → 0.005 ns per loop).
8. Memoization with functools.lru_cache
Caching expensive recursive calls reduces the cost of computing Fibonacci numbers by over 50×.
# Inefficient recursive Fibonacci
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
# Efficient version using lru_cache
import functools
@functools.lru_cache()
def fibonacci_v2(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci_v2(n - 1) + fibonacci_v2(n - 2)Result: 57.69× speedup (63.664 ns → 1.104 ns per loop).
9. Vectorization with NumPy
Replacing a Python loop with NumPy’s vectorized operations yields a ~28× improvement.
import numpy as np
def test_11_v0(n):
output = 0
for i in range(0, n):
output = output + i
return output
def test_11_v1(n):
output = np.sum(np.arange(n))
return outputResult: 28.13× speedup (32.936 ns → 1.171 ns per loop).
10. Avoid Creating Intermediate Lists (filterfalse)
Using itertools.filterfalse eliminates temporary lists and can speed up processing by more than 100×.
# Baseline (creates intermediate list)
def test_12_v0(numbers):
filtered_data = []
for i in numbers:
filtered_data.extend(list(
filter(lambda x: x % 5 == 0, range(1, i**2))))
return filtered_data
# Improved (filterfalse)
from itertools import filterfalse
def test_12_v1(numbers):
filtered_data = []
for i in numbers:
filtered_data.extend(list(
filterfalse(lambda x: x % 5 != 0, range(1, i**2))))
return filtered_dataResult: 131.07× speedup (333167.790 ns → 2541.850 ns per loop).
11. Efficient String Concatenation
Using ''.join() instead of the += operator reduces time complexity from O(n²) to O(n), giving about a 1.5× boost.
# Baseline (+= concatenation)
def test_13_v0(l_strings):
output = ""
for a_str in l_strings:
output += a_str
return output
# Improved (join)
def test_13_v1(l_strings):
output_list = []
for a_str in l_strings:
output_list.append(a_str)
return "".join(output_list)Result: 1.54× speedup (32.423 ns → 21.051 ns per loop).
Summary
The techniques above demonstrate that simple, well‑chosen Python idioms—list comprehensions, pre‑computing values, set operations, generator usage, built‑in functions like map , memoization, NumPy vectorization, filterfalse , and join —can collectively accelerate for‑loop execution from modest 1.3× gains up to extraordinary 970× improvements.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.