Save Up to 80% Memory in Python with 5 Built‑In Tricks
The article shows how to diagnose and dramatically cut Python’s memory usage by using built‑in tools such as sys.getsizeof, psutil, __slots__, generator expressions, memory‑mapped files (mmap) and string interning, providing concrete code examples, benchmarks and practical tips to avoid common pitfalls.
Why Python Seems Memory‑Hungry
A reader complained that a web‑scraping script crashed with out of memory after processing only part of a million‑row dataset, wondering if Python was the wrong language. The article explains that Python is not inherently wasteful; it simply provides a set of built‑in memory‑management utilities that most developers never use.
Step 1 – Diagnose Memory Usage (CT Scan)
Before optimizing, you need a clear picture of what consumes memory. The article demonstrates three lightweight tools: sys.getsizeof(obj) – returns the size of a single object in bytes. psutil.Process().memory_info().rss – reports the total resident memory of the current Python process. pandas.DataFrame.info(memory_usage='deep') – gives a deep inspection of a DataFrame’s memory footprint.
Note: sys.getsizeof() only measures the container object itself; it does not include the memory of elements inside lists or dictionaries. For a full accounting you must sum the sizes manually or use tracemalloc .
Technique 1 – Use __slots__ to Slim Down Class Instances
Normal Python classes store attributes in a per‑instance __dict__, which is a heavy hash table. Declaring __slots__ replaces the dictionary with a fixed set of slots, removing the overhead.
import sys
class Article:
def __init__(self, title, word_count):
self.title = title
self.word_count = word_count
class ArticleWithSlots:
__slots__ = ['title', 'word_count']
def __init__(self, title, word_count):
self.title = title
self.word_count = word_count
a1 = Article('Memory Issue', 1500)
a2 = ArticleWithSlots('Memory Solved', 1500)
memory_normal = sys.getsizeof(a1) + sys.getsizeof(a1.__dict__)
memory_slots = sys.getsizeof(a2) # no __dict__
print(f"Normal instance total size: {memory_normal} bytes")
print(f"Slots instance total size: {memory_slots} bytes")On a typical run the normal instance occupies about 152 bytes, while the slots‑based instance uses only 48 bytes. For data classes in Python 3.10+ you can enable the same benefit with @dataclass(slots=True):
from dataclasses import dataclass
@dataclass(slots=True)
class WeChatUser:
openid: str
nickname: str
follow_time: intTechnique 2 – Replace Large Lists with Generators
List comprehensions create the entire list in memory, which quickly explodes for millions of items. Generator expressions keep only the “recipe” and produce items on demand.
import sys
list_numbers = [i for i in range(10000)]
print(f"List memory: {sys.getsizeof(list_numbers)} bytes") # e.g., 87616 bytes
generator_numbers = (i for i in range(10000))
print(f"Generator memory: {sys.getsizeof(generator_numbers)} bytes") # e.g., 112 bytes
# Real‑world log processing
with open('access.log') as f:
has_error = any('ERROR' in line for line in f)
print(has_error)Note: Generators are single‑use; after they are exhausted they cannot be rewound. Create a new generator if you need to iterate again.
Technique 3 – Memory‑Map Large Files with mmap
When a file is too large to fit into RAM, reading it entirely with f.read() is disastrous. The mmap module creates a virtual memory region that represents the file; the OS loads only the accessed chunks.
import mmap
with open('massive_dataset.bin', 'r+b') as f:
with mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ) as mm:
print(f"First 10 bytes: {mm[:10]}")
pos = mm.find(b'ERROR')
if pos != -1:
mm.seek(pos)
chunk = mm.read(1024)
# process chunkTypical use cases include high‑concurrency configuration reads in game servers or financial trading systems, and fast scanning of terabyte‑scale datasets before training AI models.
Technique 4 – Choose Memory‑Efficient Data Types
Immutable tuples are more compact than mutable lists because they do not need extra space for dynamic resizing:
import sys
a_tuple = (1, 2, 3, 4, 5)
a_list = [1, 2, 3, 4, 5]
print(f"Tuple size: {sys.getsizeof(a_tuple)} bytes") # 80
print(f"List size: {sys.getsizeof(a_list)} bytes") # 120For homogeneous numeric data, the array module stores values in a C‑style contiguous block, halving the memory compared with a list:
import sys, array
py_list = [i for i in range(1000)]
arr = array.array('i', [i for i in range(1000)])
print(f"List size: {sys.getsizeof(py_list)} bytes") # ~8856
print(f"Array size: {sys.getsizeof(arr)} bytes") # ~4064For scientific workloads, NumPy ndarray and Pandas DataFrame already implement highly optimized memory layouts.
Technique 5 – String Interning to Share Identical Literals
Python automatically interns short strings consisting of letters, digits and underscores (default threshold ≈ 4096 characters). Interned strings share a single object, saving memory for repeated keys or tokens.
a = "hello_world"
b = "hello_world"
print(a is b) # True
c = "hello world!"
d = "hello world!"
print(c is d) # False
# Force interning for long strings
import sys
e = sys.intern("very long repeated sentence ...")
f = sys.intern("very long repeated sentence ...")
print(e is f) # TrueCommon Pitfalls & Core Recap
__slots__ is not a silver bullet: Overusing it on classes with highly dynamic attributes or on a small number of instances yields negligible gains and reduces flexibility.
Inheritance traps: Subclasses that do not define their own __slots__ still have a __dict__. Multiple inheritance makes slot behavior more complex.
Generator one‑time use: After exhaustion a generator cannot be reused; recreate it or convert to a list if repeated iteration is needed.
Diagnose first with sys.getsizeof and psutil.
Apply __slots__ for massive numbers of objects, generators for large sequences, and mmap for huge files.
Prefer immutable tuples, homogeneous array, and string interning for repetitive data.
Python’s flexibility does not have to come at the cost of memory. By understanding the underlying mechanisms and applying these built‑in techniques, developers can keep memory consumption low without installing third‑party packages.
References
Python official documentation – tracemalloc Python 3.10+ dataclass enhancements
Generator Expressions Best Practices
Huawei Cloud Community – Efficient use of Python mmap Memory‑Mapped File Support
Deep dive into Python string interning
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
