Fundamentals 16 min read

Boost Python Performance: 20 Proven Tricks to Cut Execution Time

This article presents a comprehensive collection of Python performance‑boosting techniques—from choosing optimal data structures and using copy versus deepcopy, to leveraging generators, join, efficient string formatting, loop optimizations, C‑based modules, parallel programming, and the PyPy JIT—each illustrated with concrete code examples and benchmark results.

MaGe Linux Operations

Apr 7, 2018

Boost Python Performance: 20 Proven Tricks to Cut Execution Time

Optimizing Python Algorithms and Code

Algorithmic time complexity has the greatest impact on program efficiency; in Python you can improve it by selecting appropriate data structures (e.g., list lookup is O(n) while set is O(1)). Common optimization ideas include divide‑and‑conquer, branch‑and‑bound, greedy, and dynamic programming.

Reduce Redundant Data

Store large symmetric matrices using only the upper or lower triangular part, and represent matrices that are mostly zeros with sparse matrix formats.

Use copy and deepcopy Wisely

Assigning dict or list creates a reference. To duplicate an object, use copy.copy for a shallow copy or copy.deepcopy for a deep copy. Benchmarks show deepcopy is about ten times slower than shallow copy.

import copy
a = range(100000)
%timeit -n 10 copy.copy(a)
%timeit -n 10 copy.deepcopy(a)

The deepcopy version is an order of magnitude slower.

Use dict or set for Fast Lookups

Both Python dict and set are hash‑table based (similar to C++11 unordered_map) and provide O(1) lookup.

a = range(1000)
s = set(a)
d = dict((i, 1) for i in a)
%timeit -n 10000 100 in d
%timeit -n 10000 100 in s

Dictionary lookups are slightly faster but use more memory.

Use Generators ( generator ) and yield

a = range(100000)
%timeit -n 100 a = (i for i in range(100000))
%timeit -n 100 a = [i for i in range(100000)]

Generators consume constant memory and are faster for large data streams; however, a plain for loop over a list can be faster when the loop body is simple.

def yield_func(ls):
    for i in ls:
        yield i+1

def not_yield_func(ls):
    return [i+1 for i in ls]

ls = range(1000000)
%timeit -n 10 for i in yield_func(ls): pass
%timeit -n 10 for i in not_yield_func(ls): pass

When the loop contains a break, generators show clear advantages.

Optimize Loops

a = range(10000)
size_a = len(a)
%timeit -n 1000 for i in a: k = len(a)
%timeit -n 1000 for i in a: k = size_a

Moving invariant calculations (e.g., len(a)) outside the loop halves the execution time.

Order Multiple Conditional Expressions

a = range(2000)
%timeit -n 100 [i for i in a if 10 < i < 20 or 1000 < i < 2000]
%timeit -n 100 [i for i in a if 1000 < i < 2000 or 10 < i < 20]
%timeit -n 100 [i for i in a if i % 2 == 0 and i > 1900]
%timeit -n 100 [i for i in a if i > 1900 and i % 2 == 0]

Placing the condition that filters out most items first (for and) or the most permissive condition first (for or) yields noticeable speedups.

Use join to Concatenate Strings

%timeit -n 10000 s = ''
for i in a: s += i
%timeit -n 100000 s = ''.join(a)

join

is roughly five times faster than repeated += concatenation.

Choose Efficient String Formatting

%timeit -n 100000 'abc%s%s' % (s1, s2)
%timeit -n 100000 'abc{0}{1}'.format(s1, s2)
%timeit -n 100000 'abc' + s1 + s2

The percent‑formatting style is the slowest, but all three methods are very fast; percent‑formatting is often preferred for readability.

Swap Variables Without a Temporary

%timeit -n 10000 a,b=1,2; c=a;a=b;b=c
%timeit -n 10000 a,b=1,2; a,b=b,a

Tuple unpacking ( a,b=b,a) is more than twice as fast as using a temporary variable.

Use if is True Instead of if == True

%timeit -n 100 [i for i in a if i == True]
%timeit -n 100 [i for i in a if i is True]

The identity test ( is) is nearly twice as fast as equality comparison.

Use Chained Comparisons ( x < y < z )

%timeit -n 1000000 if x < y < z: pass
%timeit -n 1000000 if x < y and y < z: pass

Chained comparisons are slightly faster and more readable.

Prefer while 1 Over while True

def while_1():
    n = 100000
    while 1:
        n -= 1
        if n <= 0: break

def while_true():
    n = 100000
    while True:
        n -= 1
        if n <= 0: break
%timeit -n 100 while_1()
%timeit -n 100 while_true()

In Python 2.x, while 1 is noticeably faster because True is a global variable rather than a keyword.

Use Exponentiation Operator ( ** ) Instead of pow

%timeit -n 10000 c = pow(2,20)
%timeit -n 10000 c = 2**20

The ** operator is more than ten times faster.

Prefer C‑Implemented Modules (e.g., cPickle , cStringIO )

import cPickle, pickle
a = range(10000)
%timeit -n 100 x = cPickle.dumps(a)
%timeit -n 100 x = pickle.dumps(a)

C‑based modules provide speedups of an order of magnitude over their pure‑Python counterparts.

Fast Deserialization

import json, cPickle
a = range(10000)
s1 = str(a)
s2 = cPickle.dumps(a)
s3 = json.dumps(a)
%timeit -n 100 x = eval(s1)
%timeit -n 100 x = cPickle.loads(s2)
%timeit -n 100 x = json.loads(s3)

JSON deserialization is about three times faster than cPickle and twenty times faster than eval.

Use C Extensions (Extension)

Three main approaches let Python call C libraries: CPython native API (requires Python.h), ctypes for wrapping existing DLL/SO files, Cython for writing C‑like Python code, and cffi which mirrors ctypes on PyPy. They can yield hundreds‑fold speed improvements for bottleneck modules.

Parallel Programming

Because of the GIL, true parallelism requires multiprocessing. Use Process or Pool for CPU‑bound tasks, multiprocessing.dummy (thread‑based) for I/O‑bound work, and Managers for shared state across processes.

PyPy – The Ultimate Speed Booster

PyPy, a JIT‑compiled Python implementation, is typically 6× faster than CPython. It excels with pure‑Python code but loses its advantage when C extensions (other than cffi) are used.

Performance Profiling Tools

Besides timeit, use cProfile (e.g., python -m cProfile script.py) to identify hot functions and guide targeted optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Python Data Structures Parallel Programming Code Profiling

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.