Fundamentals 15 min read

Save Up to 80% Memory in Python with 5 Built‑In Tricks

The article shows how to diagnose and dramatically cut Python’s memory usage by using built‑in tools such as sys.getsizeof, psutil, __slots__, generator expressions, memory‑mapped files (mmap) and string interning, providing concrete code examples, benchmarks and practical tips to avoid common pitfalls.

Data STUDIO
Data STUDIO
Data STUDIO
Save Up to 80% Memory in Python with 5 Built‑In Tricks

Why Python Seems Memory‑Hungry

A reader complained that a web‑scraping script crashed with out of memory after processing only part of a million‑row dataset, wondering if Python was the wrong language. The article explains that Python is not inherently wasteful; it simply provides a set of built‑in memory‑management utilities that most developers never use.

Step 1 – Diagnose Memory Usage (CT Scan)

Before optimizing, you need a clear picture of what consumes memory. The article demonstrates three lightweight tools: sys.getsizeof(obj) – returns the size of a single object in bytes. psutil.Process().memory_info().rss – reports the total resident memory of the current Python process. pandas.DataFrame.info(memory_usage='deep') – gives a deep inspection of a DataFrame’s memory footprint.

Note: sys.getsizeof() only measures the container object itself; it does not include the memory of elements inside lists or dictionaries. For a full accounting you must sum the sizes manually or use tracemalloc .

Technique 1 – Use __slots__ to Slim Down Class Instances

Normal Python classes store attributes in a per‑instance __dict__, which is a heavy hash table. Declaring __slots__ replaces the dictionary with a fixed set of slots, removing the overhead.

import sys

class Article:
    def __init__(self, title, word_count):
        self.title = title
        self.word_count = word_count

class ArticleWithSlots:
    __slots__ = ['title', 'word_count']
    def __init__(self, title, word_count):
        self.title = title
        self.word_count = word_count

a1 = Article('Memory Issue', 1500)
a2 = ArticleWithSlots('Memory Solved', 1500)

memory_normal = sys.getsizeof(a1) + sys.getsizeof(a1.__dict__)
memory_slots = sys.getsizeof(a2)  # no __dict__
print(f"Normal instance total size: {memory_normal} bytes")
print(f"Slots instance total size: {memory_slots} bytes")

On a typical run the normal instance occupies about 152 bytes, while the slots‑based instance uses only 48 bytes. For data classes in Python 3.10+ you can enable the same benefit with @dataclass(slots=True):

from dataclasses import dataclass

@dataclass(slots=True)
class WeChatUser:
    openid: str
    nickname: str
    follow_time: int

Technique 2 – Replace Large Lists with Generators

List comprehensions create the entire list in memory, which quickly explodes for millions of items. Generator expressions keep only the “recipe” and produce items on demand.

import sys

list_numbers = [i for i in range(10000)]
print(f"List memory: {sys.getsizeof(list_numbers)} bytes")  # e.g., 87616 bytes

generator_numbers = (i for i in range(10000))
print(f"Generator memory: {sys.getsizeof(generator_numbers)} bytes")  # e.g., 112 bytes

# Real‑world log processing
with open('access.log') as f:
    has_error = any('ERROR' in line for line in f)
print(has_error)
Note: Generators are single‑use; after they are exhausted they cannot be rewound. Create a new generator if you need to iterate again.

Technique 3 – Memory‑Map Large Files with mmap

When a file is too large to fit into RAM, reading it entirely with f.read() is disastrous. The mmap module creates a virtual memory region that represents the file; the OS loads only the accessed chunks.

import mmap

with open('massive_dataset.bin', 'r+b') as f:
    with mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ) as mm:
        print(f"First 10 bytes: {mm[:10]}")
        pos = mm.find(b'ERROR')
        if pos != -1:
            mm.seek(pos)
            chunk = mm.read(1024)
            # process chunk

Typical use cases include high‑concurrency configuration reads in game servers or financial trading systems, and fast scanning of terabyte‑scale datasets before training AI models.

Technique 4 – Choose Memory‑Efficient Data Types

Immutable tuples are more compact than mutable lists because they do not need extra space for dynamic resizing:

import sys
a_tuple = (1, 2, 3, 4, 5)
a_list = [1, 2, 3, 4, 5]
print(f"Tuple size: {sys.getsizeof(a_tuple)} bytes")  # 80
print(f"List size: {sys.getsizeof(a_list)} bytes")   # 120

For homogeneous numeric data, the array module stores values in a C‑style contiguous block, halving the memory compared with a list:

import sys, array
py_list = [i for i in range(1000)]
arr = array.array('i', [i for i in range(1000)])
print(f"List size: {sys.getsizeof(py_list)} bytes")   # ~8856
print(f"Array size: {sys.getsizeof(arr)} bytes")    # ~4064

For scientific workloads, NumPy ndarray and Pandas DataFrame already implement highly optimized memory layouts.

Technique 5 – String Interning to Share Identical Literals

Python automatically interns short strings consisting of letters, digits and underscores (default threshold ≈ 4096 characters). Interned strings share a single object, saving memory for repeated keys or tokens.

a = "hello_world"
b = "hello_world"
print(a is b)  # True

c = "hello world!"
d = "hello world!"
print(c is d)  # False

# Force interning for long strings
import sys
e = sys.intern("very long repeated sentence ...")
f = sys.intern("very long repeated sentence ...")
print(e is f)  # True

Common Pitfalls & Core Recap

__slots__ is not a silver bullet: Overusing it on classes with highly dynamic attributes or on a small number of instances yields negligible gains and reduces flexibility.

Inheritance traps: Subclasses that do not define their own __slots__ still have a __dict__. Multiple inheritance makes slot behavior more complex.

Generator one‑time use: After exhaustion a generator cannot be reused; recreate it or convert to a list if repeated iteration is needed.

Diagnose first with sys.getsizeof and psutil.

Apply __slots__ for massive numbers of objects, generators for large sequences, and mmap for huge files.

Prefer immutable tuples, homogeneous array, and string interning for repetitive data.

Python’s flexibility does not have to come at the cost of memory. By understanding the underlying mechanisms and applying these built‑in techniques, developers can keep memory consumption low without installing third‑party packages.

References

Python official documentation – tracemalloc Python 3.10+ dataclass enhancements

Generator Expressions Best Practices

Huawei Cloud Community – Efficient use of Python mmap Memory‑Mapped File Support

Deep dive into Python string interning

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonMemory Optimizationmmapgenerator__slots__String Interning
Data STUDIO
Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.