Python Code Optimization Techniques to Improve Execution Speed
This article presents a collection of Python performance optimization strategies, including avoiding global variables, minimizing attribute access, reducing unnecessary abstractions, eliminating data copying, leveraging short‑circuit logic, optimizing loops, using JIT compilation with numba, and selecting appropriate data structures to significantly speed up code execution.
0. Code Optimization Principles
This article introduces many Python speed‑up techniques. Before diving into details, you need to understand some basic optimization principles.
First principle: Do not premature optimize
Many developers start optimizing code immediately, but the premise of optimization is that the code works correctly. Optimizing too early can distract from overall performance goals.
Second principle: Weigh the cost of optimization
Optimization has a cost; solving all performance problems is almost impossible. Usually you trade time for space or space for time, and you must also consider development effort.
Third principle: Do not optimize irrelevant parts
If you try to optimize every part of the code, it becomes hard to read and understand. Identify the slow parts (often inner loops) and focus optimization there.
1. Avoid Global Variables
# Not recommended. Execution time: 26.8 seconds
import math
size = 10000
for x in range(size):
for y in range(size):
z = math.sqrt(x) + math.sqrt(y)Placing code at the global scope makes variable look‑ups slower than inside a function. Wrapping the script in a function can give a 15‑30% speed boost.
# Recommended. Execution time: 20.6 seconds
import math
def main():
size = 10000
for x in range(size):
for y in range(size):
z = math.sqrt(x) + math.sqrt(y)
main()2. Avoid . (Attribute Access)
2.1 Avoid module and function attribute access
# Not recommended. Execution time: 14.5 seconds
import math
def computeSqrt(size: int):
result = []
for i in range(size):
result.append(math.sqrt(i))
return result
def main():
size = 10000
for _ in range(size):
result = computeSqrt(size)
main()Each attribute access triggers __getattribute__ or __getattr__, which involve dictionary look‑ups and add overhead. Import the needed names directly to eliminate attribute access.
# First optimization. Execution time: 10.9 seconds
from math import sqrt
def computeSqrt(size: int):
result = []
for i in range(size):
result.append(sqrt(i))
return result
def main():
size = 10000
for _ in range(size):
result = computeSqrt(size)
main()Local variables are faster than globals; assigning frequently used functions (e.g., sqrt ) to a local variable speeds up the inner loop.
# Second optimization. Execution time: 9.9 seconds
import math
def computeSqrt(size: int):
result = []
sqrt = math.sqrt # local alias
for i in range(size):
result.append(sqrt(i))
return result
def main():
size = 10000
for _ in range(size):
result = computeSqrt(size)
main()Similarly, assigning list.append to a local variable removes repeated attribute look‑ups.
# Third optimization. Execution time: 7.9 seconds
import math
def computeSqrt(size: int):
result = []
append = result.append
sqrt = math.sqrt
for i in range(size):
append(sqrt(i))
return result
def main():
size = 10000
for _ in range(size):
result = computeSqrt(size)
main()2.2 Avoid class attribute access
# Not recommended. Execution time: 10.4 seconds
import math
from typing import List
class DemoClass:
def __init__(self, value: int):
self._value = value
def computeSqrt(self, size: int) -> List[float]:
result = []
for _ in range(size):
result.append(math.sqrt(self._value))
return result
def main():
size = 10000
for _ in range(size):
demo = DemoClass(size)
demo.computeSqrt(size)
main()Accessing self._value inside a tight loop is slower than using a local variable.
# Recommended. Execution time: 8.0 seconds
import math
from typing import List
class DemoClass:
def __init__(self, value: int):
self._value = value
def computeSqrt(self, size: int) -> List[float]:
result = []
sqrt = math.sqrt
val = self._value
for _ in range(size):
result.append(sqrt(val))
return result
def main():
size = 10000
for _ in range(size):
demo = DemoClass(size)
demo.computeSqrt(size)
main()3. Avoid Unnecessary Abstractions
# Not recommended. Execution time: 0.55 seconds
class DemoClass:
def __init__(self, value: int):
self.value = value
@property
def value(self) -> int:
return self._value
@value.setter
def value(self, x: int):
self._value = x
def main():
size = 1000000
for i in range(size):
demo = DemoClass(size)
v = demo.value
demo.value = i
main()Extra layers such as decorators, property getters/setters, or descriptors add overhead. Use plain attributes when possible.
# Recommended. Execution time: 0.33 seconds
class DemoClass:
def __init__(self, value: int):
self.value = value
def main():
size = 1000000
for i in range(size):
demo = DemoClass(size)
v = demo.value
demo.value = i
main()4. Avoid Unnecessary Data Copying
4.1 Avoid meaningless data copies
# Not recommended. Execution time: 6.5 seconds
def main():
size = 10000
for _ in range(size):
value = range(size)
value_list = [x for x in value]
square_list = [x * x for x in value_list]
main()Creating value_list is unnecessary and wastes memory.
# Recommended. Execution time: 4.8 seconds
def main():
size = 10000
for _ in range(size):
value = range(size)
square_list = [x * x for x in value]
main()4.2 Swap values without a temporary variable
# Not recommended. Execution time: 0.07 seconds
def main():
size = 1000000
for _ in range(size):
a = 3
b = 5
temp = a
a = b
b = temp
main()Using a temporary variable adds overhead.
# Recommended. Execution time: 0.06 seconds
def main():
size = 1000000
for _ in range(size):
a = 3
b = 5
a, b = b, a # tuple unpacking
main()4.3 Use join instead of + for string concatenation
# Not recommended. Execution time: 2.6 seconds
import string
from typing import List
def concatString(string_list: List[str]) -> str:
result = ''
for s in string_list:
result += s
return result
def main():
string_list = list(string.ascii_letters * 100)
for _ in range(10000):
result = concatString(string_list)
main()Using + creates a new immutable string each iteration, leading to O(n) intermediate objects.
# Recommended. Execution time: 0.3 seconds
import string
from typing import List
def concatString(string_list: List[str]) -> str:
return ''.join(string_list)
def main():
string_list = list(string.ascii_letters * 100)
for _ in range(10000):
result = concatString(string_list)
main()5. Exploit Short‑Circuit Behaviour of if
# Not recommended. Execution time: 0.05 seconds
from typing import List
def concatString(string_list: List[str]) -> str:
abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
result = ''
for s in string_list:
if s in abbreviations:
result += s
return result
def main():
for _ in range(10000):
lst = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
concatString(lst)
main()Place the condition that is more likely to be true before the or part, and the condition that is likely false after the and part, so the interpreter can skip unnecessary checks.
# Recommended. Execution time: 0.03 seconds
from typing import List
def concatString(string_list: List[str]) -> str:
abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
result = ''
for s in string_list:
if s[-1] == '.' and s in abbreviations: # short‑circuit
result += s
return result
def main():
for _ in range(10000):
lst = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
concatString(lst)
main()6. Loop Optimizations
6.1 Use for instead of while
# Not recommended. Execution time: 6.7 seconds
def computeSum(size: int) -> int:
sum_ = 0
i = 0
while i < size:
sum_ += i
i += 1
return sum_
def main():
size = 10000
for _ in range(size):
computeSum(size)
main()Python's for loop is faster than a manual while loop.
# Recommended. Execution time: 4.3 seconds
def computeSum(size: int) -> int:
sum_ = 0
for i in range(size):
sum_ += i
return sum_
def main():
size = 10000
for _ in range(size):
computeSum(size)
main()6.2 Use implicit for (built‑ins) instead of explicit loops
# Recommended. Execution time: 1.7 seconds
def computeSum(size: int) -> int:
return sum(range(size))
def main():
size = 10000
for _ in range(size):
computeSum(size)
main()6.3 Reduce work inside inner loops
# Not recommended. Execution time: 12.8 seconds
import math
def main():
size = 10000
for x in range(size):
for y in range(size):
z = math.sqrt(x) + math.sqrt(y)
main()Calling math.sqrt inside the inner loop repeats work.
# Recommended. Execution time: 7.0 seconds
import math
def main():
size = 10000
sqrt = math.sqrt
for x in range(size):
sqrt_x = sqrt(x) # compute once per outer iteration
for y in range(size):
z = sqrt_x + sqrt(y)
main()7. Use numba.jit for JIT compilation
# Recommended. Execution time: 0.62 seconds
import numba
@numba.jit
def computeSum(size: int) -> int:
sum_ = 0
for i in range(size):
sum_ += i
return sum_
def main():
size = 10000
for _ in range(size):
computeSum(size)
main()8. Choose Appropriate Data Structures
Built‑in structures such as list , tuple , dict , set , and str are implemented in C and are very fast. For frequent insert/delete operations, consider collections.deque , which provides O(1) operations at both ends. For fast look‑ups in a sorted collection, use the bisect module. For frequent min/max queries, the heapq module turns a list into a heap with O(1) access to the smallest element.
References
https://zhuanlan.zhihu.com/p/143052860
David Beazley & Brian K. Jones, *Python Cookbook*, 3rd edition, O'Reilly, 2013.
张颖 & 赖勇浩, *编写高质量代码:改善Python程序的91个建议*, 机械工业出版社, 2014.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.