Python Code Optimization Techniques for Faster Execution
This article presents practical Python performance optimization techniques, covering fundamental principles such as avoiding premature optimization, weighing trade‑offs, eliminating global variables, reducing attribute access, minimizing unnecessary abstractions, avoiding data copies, leveraging efficient loops, using short‑circuit logic, applying numba JIT, and selecting appropriate data structures to significantly speed up code execution.
Code Optimization Principles
This article introduces a collection of Python performance‑boosting techniques. Before diving into specific optimizations, it first outlines basic principles that should guide any code‑speed improvement effort.
First principle: Do not premature optimize
Many developers start writing code with performance as the primary goal, but correctness must come first. Optimizing before a program works correctly can obscure overall performance metrics and lead to misplaced effort.
Second principle: Weigh the cost of optimization
Optimization has a price; solving every performance problem is virtually impossible. Typical trade‑offs involve time versus space or development effort versus speed.
Third principle: Do not optimize irrelevant parts Optimizing every line makes code harder to read. Identify the slowest sections—usually inner loops—and focus improvements there; other parts can tolerate minor slow‑downs. Avoid Global Variables <code># Not recommended – execution time: 26.8 s import math size = 10000 for x in range(size): for y in range(size): z = math.sqrt(x) + math.sqrt(y) </code> Placing code at the module level creates global variables, which are slower to access than locals. Wrapping the script in a function can yield a 15‑30 % speed gain. <code># Recommended – execution time: 20.6 s import math def main(): size = 10000 for x in range(size): for y in range(size): z = math.sqrt(x) + math.sqrt(y) main() </code> Avoid Module and Function Attribute Access <code># Not recommended – execution time: 14.5 s import math def computeSqrt(size: int): result = [] for i in range(size): result.append(math.sqrt(i)) return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main() </code> Each attribute lookup (e.g., . ) triggers __getattribute__ or __getattr__ , which involve dictionary operations and add overhead. Importing needed functions directly eliminates this cost. <code># First optimization – execution time: 10.9 s from math import sqrt def computeSqrt(size: int): result = [] for i in range(size): result.append(sqrt(i)) # avoid math.sqrt return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main() </code> Local variables are faster than globals; assigning sqrt to a local variable further reduces lookup time. <code># Second optimization – execution time: 9.9 s import math def computeSqrt(size: int): result = [] sqrt = math.sqrt # local alias for i in range(size): result.append(sqrt(i)) return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main() </code> Even the list append method can be cached locally to avoid repeated attribute lookups. <code># Recommended – execution time: 7.9 s import math def computeSqrt(size: int): result = [] append = result.append sqrt = math.sqrt for i in range(size): append(sqrt(i)) return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main() </code> Avoid Class Attribute Access <code># Not recommended – execution time: 10.4 s import math from typing import List class DemoClass: def __init__(self, value: int): self._value = value def computeSqrt(self, size: int) -> List[float]: result = [] append = result.append sqrt = math.sqrt for _ in range(size): append(sqrt(self._value)) return result def main(): size = 10000 for _ in range(size): demo_instance = DemoClass(size) result = demo_instance.computeSqrt(size) main() </code> Accessing self._value repeatedly is slower than using a local variable. <code># Recommended – execution time: 8.0 s import math from typing import List class DemoClass: def __init__(self, value: int): self._value = value def computeSqrt(self, size: int) -> List[float]: result = [] append = result.append sqrt = math.sqrt value = self._value for _ in range(size): append(sqrt(value)) return result def main(): size = 10000 for _ in range(size): demo_instance = DemoClass(size) demo_instance.computeSqrt(size) main() </code> Avoid Unnecessary Abstractions <code># Not recommended – execution time: 0.55 s class DemoClass: def __init__(self, value: int): self.value = value @property def value(self) -> int: return self._value @value.setter def value(self, x: int): self._value = x def main(): size = 1000000 for i in range(size): demo_instance = DemoClass(size) value = demo_instance.value demo_instance.value = i main() </code> Extra layers such as decorators, property getters/setters, and descriptors add overhead. When they are not required, use plain attributes. <code># Recommended – execution time: 0.33 s class DemoClass: def __init__(self, value: int): self.value = value # simple attribute def main(): size = 1000000 for i in range(size): demo_instance = DemoClass(size) value = demo_instance.value demo_instance.value = i main() </code> Avoid Unnecessary Data Copying 4.1 Eliminate meaningless copies <code># Not recommended – execution time: 6.5 s def main(): size = 10000 for _ in range(size): value = range(size) value_list = [x for x in value] square_list = [x * x for x in value_list] main() </code> The intermediate value_list is redundant and creates extra memory overhead. <code># Recommended – execution time: 4.8 s def main(): size = 10000 for _ in range(size): value = range(size) square_list = [x * x for x in value] # no extra copy main() </code> 4.2 Swap values without a temporary variable <code># Not recommended – execution time: 0.07 s def main(): size = 1000000 for _ in range(size): a = 3 b = 5 temp = a a = b b = temp main() </code> <code># Recommended – execution time: 0.06 s def main(): size = 1000000 for _ in range(size): a = 3 b = 5 a, b = b, a # tuple unpacking main() </code> 4.3 Use join instead of + for string concatenation <code># Not recommended – execution time: 2.6 s import string from typing import List def concatString(string_list: List[str]) -> str: result = '' for str_i in string_list: result += str_i return result def main(): string_list = list(string.ascii_letters * 100) for _ in range(10000): result = concatString(string_list) main() </code> Because strings are immutable, each + creates a new object and copies data, leading to O(n²) behavior. <code># Recommended – execution time: 0.3 s import string from typing import List def concatString(string_list: List[str]) -> str: return ''.join(string_list) # single allocation def main(): string_list = list(string.ascii_letters * 100) for _ in range(10000): result = concatString(string_list) main() </code> Leverage Short‑Circuit Evaluation <code># Not recommended – execution time: 0.05 s from typing import List def concatString(string_list: List[str]) -> str: abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'} result = '' for str_i in string_list: if str_i in abbreviations: result += str_i return result def main(): for _ in range(10000): string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.'] result = concatString(string_list) main() </code> Placing the condition that is most likely to be true first allows the interpreter to skip the second check via short‑circuiting. <code># Recommended – execution time: 0.03 s from typing import List def concatString(string_list: List[str]) -> str: abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'} result = '' for str_i in string_list: if str_i[-1] == '.' and str_i in abbreviations: # short‑circuit result += str_i return result def main(): for _ in range(10000): string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.'] result = concatString(string_list) main() </code> Loop Optimizations 6.1 Prefer for over while <code># Not recommended – execution time: 6.7 s def computeSum(size: int) -> int: sum_ = 0 i = 0 while i < size: sum_ += i i += 1 return sum_ def main(): size = 10000 for _ in range(size): sum_ = computeSum(size) main() </code> for loops are generally faster than while loops in Python. <code># Recommended – execution time: 4.3 s def computeSum(size: int) -> int: sum_ = 0 for i in range(size): # use for loop sum_ += i return sum_ def main(): size = 10000 for _ in range(size): sum_ = computeSum(size) main() </code> 6.2 Use implicit iteration instead of explicit for <code># Recommended – execution time: 1.7 s def computeSum(size: int) -> int: return sum(range(size)) # implicit loop def main(): size = 10000 for _ in range(size): sum_ = computeSum(size) main() </code> 6.3 Reduce work inside inner loops <code># Not recommended – execution time: 12.8 s import math def main(): size = 10000 sqrt = math.sqrt for x in range(size): for y in range(size): z = sqrt(x) + sqrt(y) main() </code> Calling sqrt inside the inner loop repeats the same calculation many times. <code># Recommended – execution time: 7.0 s import math def main(): size = 10000 sqrt = math.sqrt for x in range(size): sqrt_x = sqrt(x) # compute once per outer iteration for y in range(size): z = sqrt_x + sqrt(y) main() </code> Use numba.jit for JIT Compilation Applying numba can compile Python functions to native machine code, dramatically reducing runtime. <code># Recommended – execution time: 0.62 s import numba @numba.jit def computeSum(size: int) -> int: sum_ = 0 for i in range(size): sum_ += i return sum_ def main(): size = 10000 for _ in range(size): sum_ = computeSum(size) main() </code> Choose Appropriate Data Structures Built‑in structures such as list , tuple , dict , set , and str are implemented in C and are highly efficient. For frequent insert/delete operations, collections.deque provides O(1) performance at both ends. When fast look‑ups are needed, maintaining a sorted list with bisect or using a heap via heapq can improve complexity to O(log n) or O(1) for min/max retrieval.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.