Boost Your Python Speed: 20 Proven Tricks to Slash Execution Time
Learn how to dramatically improve Python performance by choosing optimal data structures, minimizing redundant data, using copy wisely, leveraging dict/set lookups, generators, efficient loops, string joining, proper formatting, fast variable swapping, concise comparisons, C extensions, multiprocessing, PyPy, and profiling tools, all backed by real benchmarks.
Algorithm time complexity has the biggest impact on program efficiency; in Python you can improve it by selecting appropriate data structures, e.g., list lookup is O(n) while set lookup is O(1). Different scenarios call for different strategies such as divide‑and‑conquer, branch‑and‑bound, greedy, and dynamic programming.
Reduce Redundant Data
Store large symmetric matrices as upper or lower triangular, and use sparse matrix representations when most elements are zero.
Use copy and deepcopy wisely
Assigning a dict or list creates a reference; to copy the whole object use
copy.copyor
copy.deepcopy. The latter performs a recursive copy and is slower. Example benchmarks:
<code>import copy
a = range(100000)
%timeit -n 10 copy.copy(a)
%timeit -n 10 copy.deepcopy(a)
</code> copy.deepcopyis about an order of magnitude slower.
Use dict or set for element lookup
Both
dictand
setare hash tables with O(1) lookup. Benchmark:
<code>a = range(1000)
s = set(a)
d = dict((i,1) for i in a)
%timeit -n 10000 100 in d
%timeit -n 10000 100 in s
</code> dictis slightly faster but uses more memory.
Use generators and yield
Generators use constant memory regardless of size and can be faster for building collections. Benchmarks:
<code>%timeit -n 100 a = (i for i in range(100000))
%timeit -n 100 b = [i for i in range(100000)]
</code>Creating a generator is faster; however iterating over a list can be slightly quicker unless early exit is needed. Example with
yield:
<code>def yield_func(ls):
for i in ls:
yield i+1
def not_yield_func(ls):
return [i+1 for i in ls]
ls = range(1000000)
%timeit -n 10 for i in yield_func(ls): pass
%timeit -n 10 for i in not_yield_func(ls): pass
</code>Optimize loops
Avoid recomputing length inside loops:
<code>a = range(10000)
size_a = len(a)
%timeit -n 1000 for i in a: k = len(a)
%timeit -n 1000 for i in a: k = size_a
</code>Order of multiple conditional expressions
For
and, place the condition that fails most often first; for
or, place the condition that succeeds most often first. Benchmarks show noticeable speed differences.
Use join instead of string concatenation
<code>%%timeit
s = ''
for i in a:
s += i
%%timeit
s = ''.join(a)
</code> joinis about five times faster than repeated concatenation.
Choose efficient string formatting
<code>s1, s2 = 'ax', 'bx'
%timeit -n 100000 'abc%s%s' % (s1, s2)
%timeit -n 100000 'abc{0}{1}'.format(s1, s2)
%timeit -n 100000 'abc' + s1 + s2
</code>Percent formatting is the slowest, but all three methods are very fast; percent formatting is often considered most readable.
Swap variables without a temporary
<code>a,b=1,2
c=a;a=b;b=c # old way
a,b=1,2
a,b=b,a # tuple unpacking
</code>Tuple unpacking is roughly twice as fast.
Use is for identity comparison
<code>a = range(10000)
%timeit -n 100 [i for i in a if i == True]
%timeit -n 100 [i for i in a if i is True]
</code>Using
isis nearly twice as fast as
==.
Chained comparisons
<code>x, y, z = 1,2,3
%timeit -n 1000000 if x < y < z: pass
%timeit -n 1000000 if x < y and y < z: pass
</code>Chained comparison is slightly faster and more readable.
while 1 vs while True
<code>def while_1():
n = 100000
while 1:
n -= 1
if n <= 0: break
def while_true():
n = 100000
while True:
n -= 1
if n <= 0: break
%timeit -n 100 while_1()
%timeit -n 100 while_true()
</code> while 1is noticeably faster in Python 2 because
Trueis a global variable.
Use ** instead of pow
<code>%timeit -n 10000 c = pow(2,20)
%timeit -n 10000 c = 2**20
</code>The exponentiation operator is over ten times faster.
Use C‑implemented modules (cPickle, cStringIO, cProfile)
<code>import cPickle, pickle
a = range(10000)
%timeit -n 100 x = cPickle.dumps(a)
%timeit -n 100 x = pickle.dumps(a)
</code>C‑implemented versions are an order of magnitude faster.
Best deserialization method
<code>import json, cPickle
a = range(10000)
s1 = str(a)
s2 = cPickle.dumps(a)
s3 = json.dumps(a)
%timeit -n 100 x = eval(s1)
%timeit -n 100 x = cPickle.loads(s2)
%timeit -n 100 x = json.loads(s3)
</code> json.loadsis about three times faster than
cPickle.loadsand more than twenty times faster than
eval.
Use C extensions (CPython API, ctypes, Cython, cffi)
These allow Python code to call compiled C libraries.
ctypesis often the fastest way to wrap existing C libraries.
Cythoncan give hundreds‑fold speedups for computational kernels.
cffiprovides a convenient interface compatible with PyPy.
Parallel programming
Because of the GIL, use
multiprocessingfor CPU‑bound work (Process, Pool) and
multiprocessing.dummyfor thread‑like interfaces on I/O‑bound tasks.
multiprocessing.Managersenable shared data for distributed patterns.
PyPy – a JIT‑powered Python
PyPy, implemented in RPython, can be more than six times faster than CPython thanks to its JIT compiler. However, C extensions (except those via
cffi) may reduce or negate the speed gain.
Performance profiling tools
Besides
%timeit, use
cProfileto locate bottlenecks:
python -m cProfile script.pyreports call counts and execution time for each function.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.