Fundamentals 9 min read

Understanding Python Garbage Collection and Reference Counting

This article explains Python's garbage collection mechanism, covering object management, reference counting, cyclic reference handling, the mark‑and‑sweep algorithm, and generational collection thresholds, with illustrative code examples and detailed explanations of each step.

Byte Quality Assurance Team
Byte Quality Assurance Team
Byte Quality Assurance Team
Understanding Python Garbage Collection and Reference Counting

Garbage Collection (GC) refers to the process of destroying unused objects and freeing memory that is no longer needed by a program.

In languages like Python, Java, and Go, GC is automatic, unlike C/C++ where developers must manually allocate and free memory, often leading to errors and crashes.

Python's GC relies heavily on reference counting, where each object has an ob_refcnt field that tracks how many references point to it; when this count drops to zero, the object can be reclaimed.

Reference count increases occur when objects are created, assigned to new variables, passed as function arguments, or placed in containers; it decreases when references are deleted, reassigned, functions return, or containers are cleared.

Example code demonstrates using sys.getrefcount() to observe reference count changes for lists and integers.

print(sys.getrefcount([1]))
# 1

a = [1]
print(sys.getrefcount(a))
# 2

b = a
print(sys.getrefcount(a))
# 3

print(sys.getrefcount(1))
# 145

a = 1
print(sys.getrefcount(1))
# 146

print(sys.getrefcount(3846539865938746593784569))
# 3

Python also employs a mark‑and‑sweep algorithm to handle cyclic references that reference counting alone cannot resolve.

The algorithm works in several steps: first, objects are placed into a scanning pool and assigned a gc_refs copy of their reference count; then, references are traversed, decrementing gc_refs and moving objects with zero gc_refs to an unreachable pool.

Next, reachable objects are “rescued” from the unreachable pool, ensuring that objects still referenced (e.g., part of a cycle but reachable from a live object) are not collected.

Finally, objects remaining in the unreachable pool are destroyed by invoking their type's tp_finalize method.

A special case occurs when an object's type defines tp_del ; such objects are placed in a separate pool for manual cleanup to avoid double destruction.

Python uses generational GC, dividing objects into generations; objects that survive a collection are promoted to an older generation, which is collected less frequently. The default thresholds are (700, 10, 10), meaning a collection of generation 0 occurs when more than 700 objects are allocated, and every 10 collections of generation 0 trigger a generation 1 collection, and similarly for generation 2.

import gc
# Query thresholds
print(gc.get_threshold())  # (700, 10, 10)
# Set new thresholds
gc.set_threshold(1000, 10, 10)
Memory ManagementPythonGarbage CollectionReference CountingGenerational GC
Byte Quality Assurance Team
Written by

Byte Quality Assurance Team

World-leading audio and video quality assurance team, safeguarding the AV experience of hundreds of millions of users.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.