How Python’s Garbage Collector Works: Reference Counting, Mark‑Sweep, and Generational GC Explained
This article explains Python's memory management, covering reference counting, its limitations with cyclic references, and how the interpreter supplements it with mark‑sweep and generational garbage‑collection algorithms, complete with code examples and detailed diagrams of the underlying mechanisms.
In Python, most objects are managed through reference counting, the most intuitive and simple garbage‑collection technique, but it suffers from performance overhead and the fatal weakness of circular references.
Immutable objects such as PyIntObject and PyStringObject can never form cycles because they never hold references to other objects. Circular references occur only among container objects (dict, list, class, interface, etc.), so a mark‑sweep collector is needed to break these cycles.
Reference‑Counting Strategy
Reference counting increments a counter when a reference is created or copied and decrements it when a reference is destroyed. When the counter reaches zero, the object can be immediately reclaimed.
def new_obj(size):
# pick a free chunk from the heap
obj = pick_chunk(size, free_list)
if obj is None:
raise RuntimeError("allocation failed")
else:
obj.ref_cnt = 1
return obj def update_ptr(ptr, obj):
inc_ref_cnt(obj)
dec_ref_cnt(ptr)
ptr = obj
def inc_ref_cnt(obj):
obj.ref_cnt += 1
def dec_ref_cnt(obj):
obj.ref_cnt -= 1
if obj.ref_cnt == 0:
# recursively free children
for child in children(obj):
dec_ref_cnt(child)
del objAdvantages : immediate reclamation (real‑time) and short maximum pause time because objects are freed as soon as their count drops to zero.
Disadvantages : high execution cost for every assignment, and inability to collect cycles.
Mark‑Sweep Strategy
Mark‑sweep works in two phases: first, all reachable objects are marked starting from the roots; second, unmarked objects are swept (freed).
def mark_sweep():
mark_phase() # mark reachable objects
sweep_phase() # free unmarked objects
def mark_phase():
for r in roots:
mark(r)
def mark(obj):
if not obj.mark:
obj.mark = True
for child in children(obj):
mark(child)
def sweep_phase():
for obj in heap:
if obj.mark:
obj.mark = False # prepare for next GC
else:
free_list.append(obj)Advantages : simple to implement and compatible with conservative GC.
Disadvantages : memory fragmentation and linear allocation cost because the whole heap must be scanned.
Generational Garbage Collection
Python divides objects into generations based on their "age" (how many collections they have survived). New objects start in generation 0; objects that survive a collection are promoted to the next generation. Younger generations are collected more frequently.
The heap consists of a "new" space and two survivor spaces (From and To). When the new space fills, a minor GC copies live objects to a survivor space or promotes them to the old generation.
def minor_gc():
to_survivor_free = to_survivor_start
for r in root_objects:
if r < old_start:
r = copy(r)
# process remembered set (old→young references)
i = 0
while i < rs_index:
has_new_obj = False
for child in children(rs[i]):
if child < old_start:
child = copy(child)
if child < old_start:
has_new_obj = True
if not has_new_obj:
rs[i].remembered = False
rs_index -= 1
swap(rs[i], rs[rs_index])
else:
i += 1
swap(from_survivor_start, to_survivor_start)When the old generation fills, a major GC runs using the mark‑sweep algorithm.
Container Objects and Tracking
Only container objects (lists, dicts, sets, classes, functions, etc.) are tracked by the GC. Non‑container objects (ints, floats, booleans, None) are untracked.
When a tracked object is allocated, _PyObject_GC_TRACK inserts it into generation 0’s doubly‑linked list:
static inline void _PyObject_GC_TRACK(PyObject *op) {
PyGC_Head *gc = _Py_AS_GC(op);
PyInterpreterState *interp = _PyInterpreterState_GET();
PyGC_Head *generation0 = interp->gc.generation0;
PyGC_Head *last = (PyGC_Head*)(generation0->_gc_prev);
_PyGCHead_SET_NEXT(last, gc);
_PyGCHead_SET_PREV(gc, last);
_PyGCHead_SET_NEXT(gc, generation0);
generation0->_gc_prev = (uintptr_t)gc;
}When the object is deallocated, _PyObject_GC_UNTRACK removes it from the list, and PyObject_GC_Del frees the memory.
GC Generations
Python maintains three generations (0, 1, 2). Each generation has a PyGC_Head list, a threshold, and a count. When generation 0’s count exceeds its threshold (default 700), gc_collect_generations is invoked, which may trigger a full or partial collection depending on which generation’s threshold is crossed.
static Py_ssize_t gc_collect_generations(PyThreadState *tstate) {
GCState *gcstate = &tstate->interp->gc;
for (int i = NUM_GENERATIONS-1; i >= 0; i--) {
if (gcstate->generations[i].count > gcstate->generations[i].threshold) {
if (i == NUM_GENERATIONS-1 &&
gcstate->long_lived_pending < gcstate->long_lived_total/4)
continue; // defer full collection
return gc_collect_with_callback(tstate, i);
}
}
return 0;
}Main Collection Algorithm ( gc_collect_main )
The collector works as follows:
Increment the count of the older generation (if any).
Reset the counts of the generations being collected.
Merge younger generations into the target generation.
Copy reference counts ( update_refs) and then decrement them ( subtract_refs) to find root objects.
Move objects with __del__ methods to a separate finalizers list.
Traverse the finalizers list to pull in any objects reachable from them.
Free objects that are unreachable and have no finalizers.
Append objects with finalizers to gc.garbage for user inspection.
Key helper functions include: deduce_unreachable: performs update_refs and subtract_refs, then separates reachable from unreachable objects. move_unreachable: moves objects whose copied reference count is zero to the unreachable list. finalize_garbage: calls tp_finalize (i.e., __del__) on finalizable objects. delete_garbage: either appends objects to gc.garbage (if DEBUG_SAVEALL) or calls their tp_clear method to break cycles before freeing.
Reference‑Counting Macros in CPython
#define _Py_XINCREF(op) if ((op) != NULL) Py_INCREF(op)
static inline void _Py_INCREF(PyObject *op) {
op->ob_refcnt++;
}
static inline void _Py_XDECREF(PyObject *op) {
if (op != NULL) Py_DECREF(op);
}
static inline void _Py_DECREF(PyObject *op) {
if (--op->ob_refcnt != 0) {
// debug checks omitted for brevity
} else {
_Py_Dealloc(op);
}
}For container types like dict, the deallocator first untracks the object, decrements references of all entries, frees the key/value arrays, and finally releases the memory (or returns the object to a free‑list).
static void dict_dealloc(PyDictObject *mp) {
PyObject_GC_UnTrack(mp);
if (mp->ma_values != NULL) {
if (mp->ma_values != empty_values) {
for (i = 0, n = mp->ma_keys->dk_nentries; i < n; i++)
Py_XDECREF(mp->ma_values[i]);
free_values(mp->ma_values);
}
dictkeys_decref(mp->ma_keys);
} else if (mp->ma_keys != NULL) {
dictkeys_decref(mp->ma_keys);
}
if (state->numfree < PyDict_MAXFREELIST && Py_IS_TYPE(mp, &PyDict_Type))
state->free_list[state->numfree++] = mp;
else
Py_TYPE(mp)->tp_free((PyObject *)mp);
}Putting It All Together
Python’s memory manager therefore uses a hybrid approach: fast reference counting for immediate reclamation, supplemented by a cyclic‑GC that runs periodically, employing both mark‑sweep and generational strategies to keep the overhead low while still being able to collect cycles.
References
Garbage‑collection algorithms and implementations
Python source code analysis
cpython‑internals documentation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
