How Python Dictionaries Resize: The Full Expansion Process
The article explains Python's dict resizing mechanism, detailing when the hash table grows (when used entries reach two‑thirds of capacity), how the new size is chosen as the smallest power‑of‑two ≥ ma_used × 3, and the exact steps the interpreter takes to reallocate and rebuild the underlying key and entry arrays.
When a Python dictionary’s used entry count reaches two‑thirds of its total capacity, the hash table must resize. The new capacity must satisfy two conditions: it is at least ma_used * 3 and it is a power of two. The smallest value meeting both constraints is computed by calculate_log2_keysize, which returns the logarithm (base 2) of the required capacity.
The resize is triggered by insertion_resize, which forwards the calculated log‑size to dictresize. dictresize allocates a new PyDictKeysObject for the dictionary, checks that the new size does not exceed the maximum allowed ( SIZEOF_SIZE_T * 8), and determines whether the dictionary uses a split table (separate key and value arrays) or a combined table.
If the dictionary uses a split table, the resize converts it to a combined table. For a split‑to‑generic conversion, each surviving PyDictUnicodeEntry is copied into a new PyDictKeyEntry, the key reference is retained, the hash is recomputed, and the value pointer is transferred. The hash index array is rebuilt with build_indices_generic. For a split‑to‑unicode conversion, the process is similar but uses build_indices_unicode. After copying, the old ma_values array is freed.
If the dictionary already uses a combined table, the resize handles three sub‑cases:
Generic → Generic: if no entries were deleted, memcpy copies the old entries; otherwise, only entries with non‑NULL values are copied.
Unicode → Unicode: analogous to the generic case, with an optional fast path when entry counts match.
Unicode → Generic or Generic → Unicode: entries are transformed between 16‑byte and 24‑byte layouts by iterating and assigning fields individually.
After moving the live entries, the hash index array is rebuilt (either build_indices_generic or build_indices_unicode), the old key object is decref’ed, and, if a split table was present, the old value object is freed. Finally, dk_usable and dk_nentries are updated to reflect the new capacity, and consistency is asserted.
The article also contrasts the historical single‑array implementation (which stored both entries and the index, causing high memory waste) with the modern two‑array design that separates the hash index from the key‑value entries, reducing memory overhead. It illustrates the structures of combined and split tables with diagrams and shows how a dictionary with three items ( "a": 1, "b": 2, "c": 3) looks in each layout.
In summary, Python’s dict resize algorithm ensures memory efficiency by growing the hash table only when necessary, selecting an optimal power‑of‑two size, and carefully migrating live entries while preserving key order and rebuilding indices.
// Objects/dictobject.c
#define GROWTH_RATE(d) ((d)->ma_used*3) // Objects/dictobject.c
static inline uint8_t calculate_log2_keysize(Py_ssize_t minsize) {
uint8_t log2_size;
for (log2_size = PyDict_LOG_MINSIZE;
((Py_ssize_t)1 << log2_size) < minsize;
log2_size++)
;
return log2_size;
} // Objects/dictobject.c
static int insertion_resize(PyInterpreterState *interp, PyDictObject *mp, int unicode) {
return dictresize(interp, mp,
calculate_log2_keysize(GROWTH_RATE(mp)), unicode);
} // Objects/dictobject.c
static int dictresize(PyInterpreterState *interp, PyDictObject *mp,
uint8_t log2_newsize, int unicode) {
// ... (implementation as described above) ...
return 0;
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Satori Komeiji's Programming Classroom
Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
