Unveiling Python’s Dict Cache Pool: How Dictionaries Reuse Memory
The article explains the internal cache‑pool mechanism of CPython dictionaries, detailing how PyDictObject and PyDictKeysObject are allocated, deallocated, and reused via free‑lists of size 80, and demonstrates the behavior with concrete code examples and benchmarks.
Python dictionaries store their keys in ma_keys and values in ma_values. When the hash table is a separate table , keys are kept in ma_keys and values in ma_values; when it is a combined table , both keys and values reside in ma_keys.
Destruction of a PyDictObject
When a PyDictObject is destroyed, the interpreter first decrements the reference count of each value (if a separate table) and then each key, finally releasing ma_values and ma_keys. For a combined table only the keys need their reference count decremented before ma_keys is freed.
// Objects/dictobject.c (simplified)
static void dict_dealloc(PyDictObject *mp) {
PyDictValues *values = mp->ma_values;
PyDictKeysObject *keys = mp->ma_keys;
PyObject_GC_UnTrack(mp);
Py_TRASHCAN_BEGIN(mp, dict_dealloc)
if (values != NULL) {
for (i = 0, n = mp->ma_keys->dk_nentries; i < n; i++)
Py_XDECREF(values->values[i]);
free_values(values);
dictkeys_decref(interp, keys);
} else if (keys != NULL) {
dictkeys_decref(interp, keys);
}
// cache‑pool handling omitted for brevity
Py_TRASHCAN_END
}Cache‑pool implementation
The free‑list is defined in pycore_dict_state.h as two arrays of length PyDict_MAXFREELIST (80): one for PyDictObject instances and one for PyDictKeysObject instances.
#define PyDict_MAXFREELIST 80
struct _Py_dict_state {
uint64_t global_version;
uint32_t next_keys_version;
#if PyDict_MAXFREELIST > 0
PyDictObject *free_list[PyDict_MAXFREELIST];
PyDictKeysObject *keys_free_list[PyDict_MAXFREELIST];
int numfree;
int keys_numfree;
#endif
PyDict_WatchCallback watchers[DICT_MAX_WATCHERS];
};During deallocation, if state->numfree < PyDict_MAXFREELIST and the object is a true PyDict, the pointer is stored at the tail of free_list and numfree is incremented. Otherwise the memory is returned to the system heap.
if (state->numfree < PyDict_MAXFREELIST && Py_IS_TYPE(mp, &PyDict_Type)) {
state->free_list[state->numfree++] = mp;
OBJECT_STAT_INC(to_freelist);
} else {
Py_TYPE(mp)->tp_free((PyObject *)mp);
}Creation from the cache‑pool
When a new dictionary is needed, new_dict checks state->numfree. If a cached object exists, it is popped from the tail, its reference count reset to 1, and reused; otherwise memory is allocated from the heap.
if (state->numfree) {
mp = state->free_list[--state->numfree];
assert(mp != NULL && Py_IS_TYPE(mp, &PyDict_Type));
OBJECT_STAT_INC(from_freelist);
_Py_NewReference((PyObject *)mp);
} else {
mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
if (mp == NULL) return NULL;
}Cache‑pool for PyDictKeysObject
The keys object has its own free‑list with the same size. It is only used when three conditions hold:
The hash table’s capacity (log2 size) equals 3, i.e., the table holds 8 slots.
The free‑list still has space ( keys_numfree < PyDict_MAXFREELIST).
All keys are Unicode strings (the table is a Unicode table).
If these are satisfied, the keys object is stored in keys_free_list; otherwise it is freed with PyObject_Free.
if (DK_LOG_SIZE(keys) == PyDict_LOG_MINSIZE &&
state->keys_numfree < PyDict_MAXFREELIST &&
DK_IS_UNICODE(keys)) {
state->keys_free_list[state->keys_numfree++] = keys;
OBJECT_STAT_INC(to_freelist);
return;
}
PyObject_Free(keys);Empirical verification
Using a short Python script, the author shows that dictionary objects are reused from the cache‑list:
d1 = {k: 1 for k in "abcdef"}
print("id(d1):", id(d1))
del d1
d2 = {k: 1 for k in "abcdefghijk"}
print("id(d2):", id(d2))
# ids are identical, confirming reuseFurther, a ctypes inspection reveals the addresses of ma_keys for several dictionaries. When the table size is exactly 8 and all keys are strings, the same ma_keys address is reused (e.g., d3 and d5 share the same ma_keys), proving that the PyDictKeysObject cache works under the documented constraints.
Conclusion
The CPython interpreter maintains free‑lists for both PyDictObject and PyDictKeysObject, each with a capacity of 80 entries. This mechanism enables O(1) reuse of small dictionaries (capacity 8) and their key tables, reducing allocation overhead and memory fragmentation. Larger tables or non‑Unicode key tables bypass the cache to avoid excessive memory consumption.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Satori Komeiji's Programming Classroom
Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
