20 Proven Python Tricks to Supercharge Your Code Performance
This article presents 20 practical Python performance tips—from choosing O(1) data structures and reducing redundant data to leveraging generators, C extensions, multiprocessing, and profiling tools—complete with benchmark code snippets that demonstrate measurable speed gains across common coding patterns.
1. Optimize Algorithm Time Complexity
Algorithm time complexity has the greatest impact on execution efficiency; in Python you can improve it by selecting appropriate data structures (e.g., list O(n) vs set O(1)) and applying techniques such as divide‑and‑conquer, branch‑and‑bound, greedy, or dynamic programming.
2. Reduce Redundant Data
Store large symmetric matrices using only the upper or lower triangle, and represent sparse matrices with specialized formats to avoid unnecessary zero elements.
3. Use copy and deepcopy Wisely
Assigning a dict or list creates a reference; to duplicate the whole object use copy.copy() or copy.deepcopy(). The latter performs recursive copying and is significantly slower, as shown by the benchmark:
import copy
a=range(100000)
%timeit -n10 copy.copy(a) # copy
%timeit -n10 copy.deepcopy(a) # deepcopy
10 loops, best of 3: 1.55 ms per loop
10 loops, best of 3: 151 ms per loop4. Use dict or set for Fast Lookups
Both dict and set are hash‑table based (similar to C++ unordered_map) with O(1) lookup time. Benchmark:
a=range(1000)
s=set(a)
d=dict((i,1) for i in a)
%timeit -n1000 100 in d
%timeit -n1000 100 in s
10000 loops, best of 3: 43.5 ns per loop
10000 loops, best of 3: 49.6 ns per loop5. Use Generators and yield
Generators create objects whose memory usage does not depend on the size of the generated sequence, offering speed advantages in many cases. Example benchmarks compare generator expressions with list comprehensions and show that generators are faster for large data.
%timeit -n100 (i for i in range(100000))
%timeit -n100 [i for i in range(100000)]
100 loops, best of 3: 1.54 ms per loop
100 loops, best of 3: 4.56 ms per loopWhen iterating, a list comprehension can be slightly faster, but generators shine when a break is needed. The yield statement creates a generator function:
def yield_func(ls):
for i in ls:
yield i+1
def not_yield_func(ls):
return [i+1 for i in ls]
ls=range(1000000)
%timeit -n10 for i in yield_func(ls): pass
%timeit -n10 for i in not_yield_func(ls): pass
10 loops, best of 3: 63.8 ms per loop
10 loops, best of 3: 62.9 ms per loop6. Optimize Loops
Avoid placing invariant calculations inside loops. Moving the length calculation outside the loop can halve execution time.
a=range(10000)
size_a=len(a)
%timeit -n1000 for i in a: k=len(a)
%timeit -n1000 for i in a: k=size_a
1000 loops, best of 3: 569 µs per loop
1000 loops, best of 3: 256 µs per loop7. Order Multiple Condition Expressions
For and, place the condition that is less likely to be true first; for or, place the more likely condition first. Benchmarks illustrate the impact.
a=range(2000)
%timeit -n100 [i for i in a if 10<i<20 or 1000<i<2000]
%timeit -n100 [i for i in a if 1000<i<2000 or 10<i<20]
%timeit -n100 [i for i in a if i%2==0 and i>1900]
%timeit -n100 [i for i in a if i>1900 and i%2==0]
100 loops, best of 3: 287 µs per loop
100 loops, best of 3: 214 µs per loop
100 loops, best of 3: 128 µs per loop
100 loops, best of 3: 56.1 µs per loop8. Use join to Concatenate Strings
Using ''.join() to build a string from an iterator is about five times faster than repeated += concatenation.
%timeit -n10000 s=''
for i in a: s+=i
10000 loops, best of 3: 59.8 µs per loop
%timeit -n100000 s=''.join(a)
100000 loops, best of 3: 11.8 µs per loop9. Choose Efficient String Formatting
Among the three common methods, the % operator is the slowest, while concatenation and str.format() are comparable; the % style is often preferred for readability.
s1,s2='ax','bx'
%timeit -n100000 'abc%s%s' % (s1,s2)
%timeit -n100000 'abc{0}{1}'.format(s1,s2)
%timeit -n100000 'abc' + s1 + s2
100000 loops, best of 3: 183 ns per loop
100000 loops, best of 3: 169 ns per loop
100000 loops, best of 3: 103 ns per loop10. Swap Variables Without a Temporary
Using tuple unpacking a,b=b,a is more than twice as fast as the classic three‑step swap.
a,b=1,2
c=a; a=b; b=c # three‑step
%timeit -n10000 a,b=b,a
10000 loops, best of 3: 86 ns per loop11. Use is for Boolean Comparison
Testing if x is True is nearly twice as fast as if x == True.
a=range(10000)
%timeit -n100 [i for i in a if i==True]
%timeit -n100 [i for i in a if i is True]
100 loops, best of 3: 531 µs per loop
100 loops, best of 3: 362 µs per loop12. Use Chained Comparisons
Expression x < y < z is slightly faster and more readable than x < y and y < z.
x,y,z=1,2,3
%timeit -n1000000 if x<y<z: pass
%timeit -n1000000 if x<y and y<z: pass
1000000 loops, best of 3: 101 ns per loop
1000000 loops, best of 3: 121 ns per loop13. Prefer while 1 Over while True in Python 2
In Python 2, while 1 runs noticeably faster because True is a global variable rather than a keyword.
def while_1():
n=100000
while 1:
n-=1
if n<=0: break
def while_true():
n=100000
while True:
n-=1
if n<=0: break
%timeit -n100 while_1()
%timeit -n100 while_true()
100 loops, best of 3: 3.69 ms per loop
100 loops, best of 3: 5.61 ms per loop14. Use ** Instead of pow
The exponentiation operator ** is more than ten times faster than the built‑in pow function.
%timeit -n10000 c=pow(2,20)
%timeit -n10000 c=2**20
10000 loops, best of 3: 284 ns per loop
10000 loops, best of 3: 16.9 ns per loop15. Use C‑implemented Packages (cPickle, cStringIO)
Modules implemented in C, such as cPickle, provide speedups of an order of magnitude over their pure‑Python counterparts.
import cPickle, pickle
a=range(10000)
%timeit -n100 x=cPickle.dumps(a)
%timeit -n100 x=pickle.dumps(a)
100 loops, best of 3: 1.58 ms per loop
100 loops, best of 3: 17 ms per loop16. Choose Efficient Deserialization
Deserializing with json.loads is about three times faster than cPickle.loads and twenty times faster than eval.
import json, cPickle
a=range(10000)
s1=str(a)
s2=cPickle.dumps(a)
s3=json.dumps(a)
%timeit -n100 eval(s1)
%timeit -n100 cPickle.loads(s2)
%timeit -n100 json.loads(s3)
100 loops, best of 3: 16.8 ms per loop
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 0.798 µs per loop17. Use C Extensions
Python can call C libraries via CPython native API, ctypes, Cython, or cffi, each offering different trade‑offs between ease of use and performance gains, often yielding several‑fold speed improvements.
18. Parallel Programming with multiprocessing
Because of the GIL, true parallelism requires the multiprocessing module for CPU‑bound tasks, while multiprocessing.dummy provides a thread‑based interface for I/O‑bound workloads. Managers enable simple distributed data sharing.
19. PyPy – A JIT‑Powered Python Interpreter
PyPy, implemented in RPython, can be six times faster than CPython thanks to its Just‑in‑Time compiler, though C extensions may reduce its advantage.
20. Profiling Tools
Beyond timeit, the cProfile module can profile entire scripts (e.g., python -m cProfile script.py) to locate bottlenecks for targeted optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
