Fundamentals 15 min read

Simple Techniques to Speed Up Python For Loops by Up to 970×

This article demonstrates a collection of straightforward Python performance tricks—such as list comprehensions, external length calculation, set usage, loop skipping, code inlining, generators, map(), memoization, vectorization, filterfalse, and string joining—that together can accelerate for‑loops from modest 1.3× gains to dramatic 970× speed‑ups, with detailed benchmark results and code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Simple Techniques to Speed Up Python For Loops by Up to 970×

In this article we present several straightforward methods to accelerate Python for loops, measuring performance with the timeit module and reporting speed‑ups ranging from 1.3× to 970×.

1. List Comprehensions

<code># Baseline version (Inefficient way)
def test_01_v0(numbers):
    output = []
    for n in numbers:
        output.append(n ** 2.5)
    return output

# Improved version (Using List Comprehension)
def test_01_v1(numbers):
    output = [n ** 2.5 for n in numbers]
    return output
</code>

Result: 2.00× faster (32.158 ns → 16.040 ns per loop).

2. External Length Calculation

<code># Baseline version (Length calculation inside for loop)
def test_02_v0(numbers):
    output_list = []
    for i in range(len(numbers)):
        output_list.append(i * 2)
    return output_list

# Improved version (Length calculation outside for loop)
def test_02_v1(numbers):
    my_list_length = len(numbers)
    output_list = []
    for i in range(my_list_length):
        output_list.append(i * 2)
    return output_list
</code>

Result: 1.64× faster (112.135 ns → 68.304 ns per loop).

3. Using set for Membership Tests

<code># Baseline version (nested loops)
def test_03_v0(list_1, list_2):
    common_items = []
    for item in list_1:
        if item in list_2:
            common_items.append(item)
    return common_items

# Improved version (set intersection)
def test_03_v1(list_1, list_2):
    s_1 = set(list_1)
    s_2 = set(list_2)
    common_items = s_1.intersection(s_2)
    return common_items
</code>

Result: 498× faster (9047.078 ns → 18.161 ns per loop).

4. Skipping Irrelevant Iterations

<code># Inefficient version
def function_do_something(numbers):
    for n in numbers:
        square = n * n
        if square % 2 == 0:
            return square
    return None

# Improved version
def function_do_something_v1(numbers):
    even_numbers = [i for i in numbers if i % 2 == 0]
    for n in even_numbers:
        square = n * n
        return square
    return None
</code>

Result: 1.94× faster (16.912 ns → 8.697 ns per loop).

5. Code Inlining (Merging Functions)

<code># Baseline version (calls is_prime inside loop)
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def test_05_v0(n):
    count = 0
    for i in range(2, n + 1):
        if is_prime(i):
            count += 1
    return count

# Improved version (inline logic)
def test_05_v1(n):
    count = 0
    for i in range(2, n + 1):
        if i <= 1:
            continue
        for j in range(2, int(i**0.5) + 1):
            if i % j == 0:
                break
        else:
            count += 1
    return count
</code>

Result: 1.35× faster (1271.188 ns → 939.603 ns per loop).

6. Avoid Repetition (Pre‑computing Values)

<code># Baseline version
for i in range(n):
    for j in range(n):
        result += i * j

# Improved version (pre‑compute matrix)
pv = [[i * j for j in range(n)] for i in range(n)]
result = 0
for i in range(n):
    result += sum(pv[i][:i+1])
</code>

Result: 1.51× faster (139.146 ns → 92.325 ns per loop).

7. Generators for Lazy Evaluation

<code># Baseline version (list‑based Fibonacci)
def test_08_v0(n):
    if n <= 1:
        return n
    f_list = [0, 1]
    for i in range(2, n + 1):
        f_list.append(f_list[i - 1] + f_list[i - 2])
    return f_list[n]

# Improved version (generator)
def test_08_v1(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b
</code>

Result: 22.06× faster (0.083 ns → 0.004 ns per loop).

8. Using map() Instead of Explicit Loops

<code># Baseline version
output = []
for i in numbers:
    output.append(some_function_X(i))

# Improved version
output = map(some_function_X, numbers)
</code>

Result: 970.69× faster (4.402 ns → 0.005 ns per loop).

9. Memoization with functools.lru_cache

<code># Inefficient recursive Fibonacci
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

# Efficient version using lru_cache
import functools
@functools.lru_cache()
def fibonacci_v2(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci_v2(n - 1) + fibonacci_v2(n - 2)
</code>

Result: 57.69× faster (63.664 ns → 1.104 ns per loop).

10. Vectorization with NumPy

<code># Baseline loop sum
output = 0
for i in range(n):
    output = output + i

# Vectorized version
output = np.sum(np.arange(n))
</code>

Result: 28.13× faster (32.936 ns → 1.171 ns per loop).

11. Avoid Creating Intermediate Lists (filterfalse)

<code># Baseline version
filtered_data = []
for i in numbers:
    filtered_data.extend(list(filter(lambda x: x % 5 == 0, range(1, i**2))))

# Improved version using filterfalse
from itertools import filterfalse
filtered_data = []
for i in numbers:
    filtered_data.extend(list(filterfalse(lambda x: x % 5 != 0, range(1, i**2))))
</code>

Result: 131.07× faster (333167.790 ns → 2541.850 ns per loop).

12. Efficient String Concatenation with join()

<code># Baseline version (using +=)
output = ""
for a_str in l_strings:
    output += a_str

# Improved version (using join)
output_list = []
for a_str in l_strings:
    output_list.append(a_str)
output = "".join(output_list)
</code>

Result: 1.54× faster (32.423 ns → 21.051 ns per loop).

Conclusion

The article introduces a variety of simple yet powerful techniques that can boost Python for loop performance from modest 1.3× improvements to extreme 970× speed‑ups, depending on the specific pattern and data size.

performanceOptimizationPythonBenchmarkingcodeloops
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.