Boost Your Python Speed: 20 Proven Tricks to Slash Execution Time
Learn how to dramatically improve Python performance by choosing optimal data structures, minimizing redundant data, using copy wisely, leveraging dict/set lookups, generators, efficient loops, string joining, proper formatting, fast variable swapping, concise comparisons, C extensions, multiprocessing, PyPy, and profiling tools, all backed by real benchmarks.
Algorithm time complexity has the biggest impact on program efficiency; in Python you can improve it by selecting appropriate data structures, e.g., list lookup is O(n) while set lookup is O(1). Different scenarios call for different strategies such as divide‑and‑conquer, branch‑and‑bound, greedy, and dynamic programming.
Reduce Redundant Data
Store large symmetric matrices as upper or lower triangular, and use sparse matrix representations when most elements are zero.
Use copy and deepcopy wisely
Assigning a dict or list creates a reference; to copy the whole object use copy.copy or copy.deepcopy. The latter performs a recursive copy and is slower. Example benchmarks:
import copy
a = range(100000)
%timeit -n 10 copy.copy(a)
%timeit -n 10 copy.deepcopy(a) copy.deepcopyis about an order of magnitude slower.
Use dict or set for element lookup
Both dict and set are hash tables with O(1) lookup. Benchmark:
a = range(1000)
s = set(a)
d = dict((i,1) for i in a)
%timeit -n 10000 100 in d
%timeit -n 10000 100 in s dictis slightly faster but uses more memory.
Use generators and yield
Generators use constant memory regardless of size and can be faster for building collections. Benchmarks:
%timeit -n 100 a = (i for i in range(100000))
%timeit -n 100 b = [i for i in range(100000)]Creating a generator is faster; however iterating over a list can be slightly quicker unless early exit is needed. Example with yield:
def yield_func(ls):
for i in ls:
yield i+1
def not_yield_func(ls):
return [i+1 for i in ls]
ls = range(1000000)
%timeit -n 10 for i in yield_func(ls): pass
%timeit -n 10 for i in not_yield_func(ls): passOptimize loops
Avoid recomputing length inside loops:
a = range(10000)
size_a = len(a)
%timeit -n 1000 for i in a: k = len(a)
%timeit -n 1000 for i in a: k = size_aOrder of multiple conditional expressions
For and, place the condition that fails most often first; for or, place the condition that succeeds most often first. Benchmarks show noticeable speed differences.
Use join instead of string concatenation
%%timeit
s = ''
for i in a:
s += i
%%timeit
s = ''.join(a) joinis about five times faster than repeated concatenation.
Choose efficient string formatting
s1, s2 = 'ax', 'bx'
%timeit -n 100000 'abc%s%s' % (s1, s2)
%timeit -n 100000 'abc{0}{1}'.format(s1, s2)
%timeit -n 100000 'abc' + s1 + s2Percent formatting is the slowest, but all three methods are very fast; percent formatting is often considered most readable.
Swap variables without a temporary
a,b=1,2
c=a;a=b;b=c # old way
a,b=1,2
a,b=b,a # tuple unpackingTuple unpacking is roughly twice as fast.
Use is for identity comparison
a = range(10000)
%timeit -n 100 [i for i in a if i == True]
%timeit -n 100 [i for i in a if i is True]Using is is nearly twice as fast as ==.
Chained comparisons
x, y, z = 1,2,3
%timeit -n 1000000 if x < y < z: pass
%timeit -n 1000000 if x < y and y < z: passChained comparison is slightly faster and more readable.
while 1 vs while True
def while_1():
n = 100000
while 1:
n -= 1
if n <= 0: break
def while_true():
n = 100000
while True:
n -= 1
if n <= 0: break
%timeit -n 100 while_1()
%timeit -n 100 while_true() while 1is noticeably faster in Python 2 because True is a global variable.
Use ** instead of pow
%timeit -n 10000 c = pow(2,20)
%timeit -n 10000 c = 2**20The exponentiation operator is over ten times faster.
Use C‑implemented modules (cPickle, cStringIO, cProfile)
import cPickle, pickle
a = range(10000)
%timeit -n 100 x = cPickle.dumps(a)
%timeit -n 100 x = pickle.dumps(a)C‑implemented versions are an order of magnitude faster.
Best deserialization method
import json, cPickle
a = range(10000)
s1 = str(a)
s2 = cPickle.dumps(a)
s3 = json.dumps(a)
%timeit -n 100 x = eval(s1)
%timeit -n 100 x = cPickle.loads(s2)
%timeit -n 100 x = json.loads(s3) json.loadsis about three times faster than cPickle.loads and more than twenty times faster than eval.
Use C extensions (CPython API, ctypes, Cython, cffi)
These allow Python code to call compiled C libraries. ctypes is often the fastest way to wrap existing C libraries. Cython can give hundreds‑fold speedups for computational kernels. cffi provides a convenient interface compatible with PyPy.
Parallel programming
Because of the GIL, use multiprocessing for CPU‑bound work (Process, Pool) and multiprocessing.dummy for thread‑like interfaces on I/O‑bound tasks. multiprocessing.Managers enable shared data for distributed patterns.
PyPy – a JIT‑powered Python
PyPy, implemented in RPython, can be more than six times faster than CPython thanks to its JIT compiler. However, C extensions (except those via cffi) may reduce or negate the speed gain.
Performance profiling tools
Besides %timeit, use cProfile to locate bottlenecks: python -m cProfile script.py reports call counts and execution time for each function.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
