Fundamentals 13 min read

Practical Python Performance Optimization Techniques

This article presents several practical Python performance‑optimization methods—including __slots__ for memory reduction, list comprehensions for faster loops, the lru_cache decorator for result caching, generators for low‑memory data processing, and local‑variable usage—to help developers write faster, more efficient code.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Practical Python Performance Optimization Techniques

In performance‑critical scenarios Python is often criticized for being slower than compiled languages, but by leveraging features from the standard library we can significantly improve execution speed and memory usage. The article details five practical optimization techniques.

1. __slots__ mechanism: memory optimization

Python stores instance attributes in a dynamic dictionary, which adds overhead. Declaring __slots__ restricts attributes to a static structure, reducing memory consumption and speeding up attribute access.

<code>from pympler import asizeof

class person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

unoptimized_instance = person("Harry", 20)
print(f"UnOptimized memory instance: {asizeof.asizeof(unoptimized_instance)} bytes")
</code>

The unoptimized instance occupies 520 bytes. Using __slots__ yields a 75 % memory reduction.

<code>from pympler import asizeof

class Slotted_person:
    __slots__ = ['name', 'age']
    def __init__(self, name, age):
        self.name = name
        self.age = age

optimized_instance = Slotted_person("Harry", 20)
print(f"Optimized memory instance: {asizeof.asizeof(optimized_instance)} bytes")
</code>

A benchmark comparing creation time and memory shows the slotted class is both smaller and faster.

2. List comprehension: loop optimization

List comprehensions are implemented as optimized C loops, typically 30‑50 % faster than equivalent for loops.

<code>import time
# Traditional for‑loop
start = time.perf_counter()
squares_loop = []
for i in range(1, 10_000_001):
    squares_loop.append(i ** 2)
end = time.perf_counter()
print(f"For loop: {end - start:.6f} seconds")

# List comprehension
start = time.perf_counter()
squares_comprehension = [i ** 2 for i in range(1, 10_000_001)]
end = time.perf_counter()
print(f"List comprehension: {end - start:.6f} seconds")
</code>

The test confirms the comprehension runs noticeably faster.

3. @lru_cache decorator: result caching

For functions with repeated calculations, functools.lru_cache stores results in memory, dramatically reducing execution time, especially for recursive algorithms.

<code>def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

import time
start = time.perf_counter()
print(f"Result: {fibonacci(35)}")
print(f"Time without cache: {time.perf_counter() - start:.6f} seconds")
</code>
<code>from functools import lru_cache
import time

@lru_cache(maxsize=128)
def fibonacci_cached(n):
    if n <= 1:
        return n
    return fibonacci_cached(n-1) + fibonacci_cached(n-2)

start = time.perf_counter()
print(f"Result: {fibonacci_cached(35)}")
print(f"Time with cache: {time.perf_counter() - start:.6f} seconds")
</code>

Cache‑enabled execution is thousands of times faster.

4. Generators: memory‑efficient data handling

Generators produce items on‑demand, avoiding the need to store large collections in memory.

<code>import sys
big_data_list = [i for i in range(10_000_000)]
print(f"Memory usage for list: {sys.getsizeof(big_data_list)} bytes")
result = sum(big_data_list)
print(f"Sum of list: {result}")

big_data_generator = (i for i in range(10_000_000))
print(f"Memory usage for generator: {sys.getsizeof(big_data_generator)} bytes")
result = sum(big_data_generator)
print(f"Sum of generator: {result}")
</code>

The generator uses only a few hundred bytes, saving over 99.99 % of memory compared to the list.

In real projects, generators are ideal for processing large log files line‑by‑line:

<code>def log_file_reader(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line

error_count = sum(1 for line in log_file_reader("large_log_file.txt") if "ERROR" in line)
print(f"Total errors: {error_count}")
</code>

5. Local variable optimization

Accessing local variables is faster than globals because the interpreter resolves names in the local namespace first.

<code>import time

global_var = 10

def access_global():
    global global_var
    return global_var

def access_local():
    local_var = 10
    return local_var

start = time.time()
for _ in range(1_000_000):
    access_global()
global_time = time.time() - start

start = time.time()
for _ in range(1_000_000):
    access_local()
local_time = time.time() - start

print(f"Global access time: {global_time:.6f} seconds")
print(f"Local access time: {local_time:.6f} seconds")
</code>

The benchmark shows local access is roughly twice as fast.

Performance Optimization Summary

Effective Python optimization combines memory‑saving techniques ( __slots__ , generators, local variables) with compute‑speed improvements (list comprehensions, lru_cache ). Developers should apply these strategies judiciously, balancing speed gains against code readability and maintainability.

performance optimizationMemory ManagementPythonbest practicesCode Profiling
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.