Boost Python Performance: 20 Proven Tricks to Cut Execution Time
This article presents a comprehensive collection of Python performance‑boosting techniques—from choosing optimal data structures and using copy versus deepcopy, to leveraging generators, join, efficient string formatting, loop optimizations, C‑based modules, parallel programming, and the PyPy JIT—each illustrated with concrete code examples and benchmark results.
Optimizing Python Algorithms and Code
Algorithmic time complexity has the greatest impact on program efficiency; in Python you can improve it by selecting appropriate data structures (e.g., list lookup is O(n) while set is O(1)). Common optimization ideas include divide‑and‑conquer, branch‑and‑bound, greedy, and dynamic programming.
Reduce Redundant Data
Store large symmetric matrices using only the upper or lower triangular part, and represent matrices that are mostly zeros with sparse matrix formats.
Use copy and deepcopy Wisely
Assigning dict or list creates a reference. To duplicate an object, use copy.copy for a shallow copy or copy.deepcopy for a deep copy. Benchmarks show deepcopy is about ten times slower than shallow copy.
import copy
a = range(100000)
%timeit -n 10 copy.copy(a)
%timeit -n 10 copy.deepcopy(a)The deepcopy version is an order of magnitude slower.
Use dict or set for Fast Lookups
Both Python dict and set are hash‑table based (similar to C++11 unordered_map) and provide O(1) lookup.
a = range(1000)
s = set(a)
d = dict((i, 1) for i in a)
%timeit -n 10000 100 in d
%timeit -n 10000 100 in sDictionary lookups are slightly faster but use more memory.
Use Generators ( generator ) and yield
a = range(100000)
%timeit -n 100 a = (i for i in range(100000))
%timeit -n 100 a = [i for i in range(100000)]Generators consume constant memory and are faster for large data streams; however, a plain for loop over a list can be faster when the loop body is simple.
def yield_func(ls):
for i in ls:
yield i+1
def not_yield_func(ls):
return [i+1 for i in ls]
ls = range(1000000)
%timeit -n 10 for i in yield_func(ls): pass
%timeit -n 10 for i in not_yield_func(ls): passWhen the loop contains a break, generators show clear advantages.
Optimize Loops
a = range(10000)
size_a = len(a)
%timeit -n 1000 for i in a: k = len(a)
%timeit -n 1000 for i in a: k = size_aMoving invariant calculations (e.g., len(a)) outside the loop halves the execution time.
Order Multiple Conditional Expressions
a = range(2000)
%timeit -n 100 [i for i in a if 10 < i < 20 or 1000 < i < 2000]
%timeit -n 100 [i for i in a if 1000 < i < 2000 or 10 < i < 20]
%timeit -n 100 [i for i in a if i % 2 == 0 and i > 1900]
%timeit -n 100 [i for i in a if i > 1900 and i % 2 == 0]Placing the condition that filters out most items first (for and) or the most permissive condition first (for or) yields noticeable speedups.
Use join to Concatenate Strings
%timeit -n 10000 s = ''
for i in a: s += i
%timeit -n 100000 s = ''.join(a) joinis roughly five times faster than repeated += concatenation.
Choose Efficient String Formatting
%timeit -n 100000 'abc%s%s' % (s1, s2)
%timeit -n 100000 'abc{0}{1}'.format(s1, s2)
%timeit -n 100000 'abc' + s1 + s2The percent‑formatting style is the slowest, but all three methods are very fast; percent‑formatting is often preferred for readability.
Swap Variables Without a Temporary
%timeit -n 10000 a,b=1,2; c=a;a=b;b=c
%timeit -n 10000 a,b=1,2; a,b=b,aTuple unpacking ( a,b=b,a) is more than twice as fast as using a temporary variable.
Use if is True Instead of if == True
%timeit -n 100 [i for i in a if i == True]
%timeit -n 100 [i for i in a if i is True]The identity test ( is) is nearly twice as fast as equality comparison.
Use Chained Comparisons ( x < y < z )
%timeit -n 1000000 if x < y < z: pass
%timeit -n 1000000 if x < y and y < z: passChained comparisons are slightly faster and more readable.
Prefer while 1 Over while True
def while_1():
n = 100000
while 1:
n -= 1
if n <= 0: break
def while_true():
n = 100000
while True:
n -= 1
if n <= 0: break
%timeit -n 100 while_1()
%timeit -n 100 while_true()In Python 2.x, while 1 is noticeably faster because True is a global variable rather than a keyword.
Use Exponentiation Operator ( ** ) Instead of pow
%timeit -n 10000 c = pow(2,20)
%timeit -n 10000 c = 2**20The ** operator is more than ten times faster.
Prefer C‑Implemented Modules (e.g., cPickle , cStringIO )
import cPickle, pickle
a = range(10000)
%timeit -n 100 x = cPickle.dumps(a)
%timeit -n 100 x = pickle.dumps(a)C‑based modules provide speedups of an order of magnitude over their pure‑Python counterparts.
Fast Deserialization
import json, cPickle
a = range(10000)
s1 = str(a)
s2 = cPickle.dumps(a)
s3 = json.dumps(a)
%timeit -n 100 x = eval(s1)
%timeit -n 100 x = cPickle.loads(s2)
%timeit -n 100 x = json.loads(s3)JSON deserialization is about three times faster than cPickle and twenty times faster than eval.
Use C Extensions (Extension)
Three main approaches let Python call C libraries: CPython native API (requires Python.h), ctypes for wrapping existing DLL/SO files, Cython for writing C‑like Python code, and cffi which mirrors ctypes on PyPy. They can yield hundreds‑fold speed improvements for bottleneck modules.
Parallel Programming
Because of the GIL, true parallelism requires multiprocessing. Use Process or Pool for CPU‑bound tasks, multiprocessing.dummy (thread‑based) for I/O‑bound work, and Managers for shared state across processes.
PyPy – The Ultimate Speed Booster
PyPy, a JIT‑compiled Python implementation, is typically 6× faster than CPython. It excels with pure‑Python code but loses its advantage when C extensions (other than cffi) are used.
Performance Profiling Tools
Besides timeit, use cProfile (e.g., python -m cProfile script.py) to identify hot functions and guide targeted optimizations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
