Fundamentals 16 min read

Simple Techniques to Accelerate Python for‑loops (1.3× to 970× Speedup)

This article presents a collection of practical Python techniques—such as list comprehensions, pre‑computing lengths, using sets, skipping irrelevant iterations, inlining functions, generators, map, memoization, vectorization, filterfalse, and join—that together can boost for‑loop performance anywhere from 1.3‑fold up to nearly a thousand times, with concrete code examples and benchmark results.

Python Programming Learning Circle

Mar 25, 2024

Simple Techniques to Accelerate Python for‑loops (1.3× to 970× Speedup)

In this tutorial we explore a series of straightforward methods that can increase the speed of Python for loops by factors ranging from 1.3× to 970×, using the timeit module to measure baseline and improved performance.

1. List Comprehension

Replacing an explicit loop with a list comprehension halves the execution time.

# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
    output = []
    for n in numbers:
        output.append(n ** 2.5)
    return output

# Improved version (Using List Comprehension)
def test_01_v1(numbers):
    output = [n ** 2.5 for n in numbers]
    return output

Result: 2.00× speedup (32.158 ns → 16.040 ns per loop).

2. Compute Length Outside the Loop

Moving the length calculation out of the loop yields a 1.6× improvement.

# Baseline version (Length calculation inside loop)
def test_02_v0(numbers):
    output_list = []
    for i in range(len(numbers)):
        output_list.append(i * 2)
    return output_list

# Improved version (Length calculation outside loop)
def test_02_v1(numbers):
    my_list_length = len(numbers)
    output_list = []
    for i in range(my_list_length):
        output_list.append(i * 2)
    return output_list

Result: 1.64× speedup (112.135 ns → 68.304 ns per loop).

3. Use set for Membership Tests

Replacing nested loops with set intersections accelerates the code by nearly 500×.

# Baseline version (nested loops)
def test_03_v0(list_1, list_2):
    common_items = []
    for item in list_1:
        if item in list_2:
            common_items.append(item)
    return common_items

# Improved version (using sets)
def test_03_v1(list_1, list_2):
    s_1 = set(list_1)
    s_2 = set(list_2)
    common_items = s_1.intersection(s_2)
    return common_items

Result: 498.17× speedup (9047.078 ns → 18.161 ns per loop).

4. Skip Irrelevant Iterations

Design the loop to avoid unnecessary work, achieving roughly a 2× gain.

# Inefficient version
def function_do_something(numbers):
    for n in numbers:
        square = n * n
        if square % 2 == 0:
            return square
    return None

# Improved version
def function_do_something_v1(numbers):
    even_numbers = [i for i in numbers if i % 2 == 0]
    for n in even_numbers:
        square = n * n
        return square
    return None

Result: 1.94× speedup (16.912 ns → 8.697 ns per loop).

5. Inline Simple Functions

Inlining the body of a frequently‑called function removes call overhead, giving about a 1.35× improvement.

# Baseline (calls is_prime)
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def test_05_v0(n):
    count = 0
    for i in range(2, n + 1):
        if is_prime(i):
            count += 1
    return count

# Improved (inlined logic)
def test_05_v1(n):
    count = 0
    for i in range(2, n + 1):
        if i <= 1:
            continue
        for j in range(2, int(i**0.5) + 1):
            if i % j == 0:
                break
        else:
            count += 1
    return count

Result: 1.35× speedup (1271.188 ns → 939.603 ns per loop).

6. Use Generators

Generators provide lazy evaluation and can dramatically reduce runtime for sequence generation.

# Baseline (list‑based Fibonacci)
def test_08_v0(n):
    if n <= 1:
        return n
    f_list = [0, 1]
    for i in range(2, n + 1):
        f_list.append(f_list[i - 1] + f_list[i - 2])
    return f_list[n]

# Improved (generator‑based)
def test_08_v1(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

Result: 22.06× speedup (0.083 ns → 0.004 ns per loop).

7. Use map()

Replacing an explicit loop with the built‑in map function can accelerate execution by nearly a thousand times.

def some_function_X(x):
    return x**2

def test_09_v0(numbers):
    output = []
    for i in numbers:
        output.append(some_function_X(i))
    return output

def test_09_v1(numbers):
    output = map(some_function_X, numbers)
    return output

Result: 970.69× speedup (4.402 ns → 0.005 ns per loop).

8. Memoization with functools.lru_cache

Caching expensive recursive calls reduces the cost of computing Fibonacci numbers by over 50×.

# Inefficient recursive Fibonacci
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

# Efficient version using lru_cache
import functools
@functools.lru_cache()
def fibonacci_v2(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci_v2(n - 1) + fibonacci_v2(n - 2)

Result: 57.69× speedup (63.664 ns → 1.104 ns per loop).

9. Vectorization with NumPy

Replacing a Python loop with NumPy’s vectorized operations yields a ~28× improvement.

import numpy as np

def test_11_v0(n):
    output = 0
    for i in range(0, n):
        output = output + i
    return output

def test_11_v1(n):
    output = np.sum(np.arange(n))
    return output

Result: 28.13× speedup (32.936 ns → 1.171 ns per loop).

10. Avoid Creating Intermediate Lists (filterfalse)

Using itertools.filterfalse eliminates temporary lists and can speed up processing by more than 100×.

# Baseline (creates intermediate list)
def test_12_v0(numbers):
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(
            filter(lambda x: x % 5 == 0, range(1, i**2))))
    return filtered_data

# Improved (filterfalse)
from itertools import filterfalse

def test_12_v1(numbers):
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(
            filterfalse(lambda x: x % 5 != 0, range(1, i**2))))
    return filtered_data

Result: 131.07× speedup (333167.790 ns → 2541.850 ns per loop).

11. Efficient String Concatenation

Using ''.join() instead of the += operator reduces time complexity from O(n²) to O(n), giving about a 1.5× boost.

# Baseline (+= concatenation)
def test_13_v0(l_strings):
    output = ""
    for a_str in l_strings:
        output += a_str
    return output

# Improved (join)
def test_13_v1(l_strings):
    output_list = []
    for a_str in l_strings:
        output_list.append(a_str)
    return "".join(output_list)

Result: 1.54× speedup (32.423 ns → 21.051 ns per loop).

Summary

The techniques above demonstrate that simple, well‑chosen Python idioms—list comprehensions, pre‑computing values, set operations, generator usage, built‑in functions like map, memoization, NumPy vectorization, filterfalse, and join —can collectively accelerate for‑loop execution from modest 1.3× gains up to extraordinary 970× improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Code Profiling Loop Optimization speedup

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.