Operations 7 min read

Monitoring Memory Usage of a Running Python Program

This article explains how to monitor memory consumption of Python data‑processing scripts using operating‑system tools, the built‑in tracemalloc module, the resource module for sampling, and a threaded MemoryMonitor class to continuously track peak usage with minimal intrusion.

Python Programming Learning Circle

Jun 29, 2020

Monitoring Memory Usage of a Running Python Program

When processing large data sets with Python libraries such as pandas and scikit‑learn, monitoring memory consumption becomes essential.

The simplest way is to rely on the operating system. Commands like top give an overview, while ps -m -o %cpu,%mem,command lists processes sorted by memory usage, showing CPU%, memory% and the command line.

The m flag orders by memory, and the o flag selects which fields to display. CPU percentages can exceed 100 % on multi‑core machines.

For more detailed insight, Python’s built‑in tracemalloc module (available since Python 3.4) records every memory allocation performed by the interpreter. A typical usage pattern is:

import tracemalloc
tracemalloc.start()
my_complex_analysis_method()
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage is {current/10**6} MB; Peak was {peak/10**6} MB")
tracemalloc.stop()

While tracemalloc provides fine‑grained data, it incurs a noticeable performance penalty (about 30 % slowdown in the author’s tests).

An alternative is the standard‑library resource module, which offers point‑sample measurements of resources such as maximum resident set size ( ru_maxrss):

import resource
usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Because resource only gives snapshots, a sampling loop is needed. The article defines a MemoryMonitor class that repeatedly calls resource.getrusage(...).ru_maxrss every 0.1 s, storing the maximum value.

import resource
from time import sleep

class MemoryMonitor:
    def __init__(self):
        self.keep_measuring = True

    def measure_usage(self):
        max_usage = 0
        while self.keep_measuring:
            max_usage = max(max_usage,
                resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
            sleep(0.1)
        return max_usage

The monitor runs in a separate thread using ThreadPoolExecutor. The main analysis function is submitted to the executor, and after it finishes, the monitor is stopped and the peak memory usage is reported:

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor() as executor:
    monitor = MemoryMonitor()
    mem_thread = executor.submit(monitor.measure_usage)
    try:
        fn_thread = executor.submit(my_analysis_function)
        result = fn_thread.result()
    finally:
        monitor.keep_measuring = False
        max_usage = mem_thread.result()
    print(f"Peak memory usage: {max_usage}")

This approach enables continuous memory sampling with minimal intrusion, suitable for profiling Python data‑processing scripts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Resource Memory Monitoring tracemalloc

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.