Fundamentals 12 min read

How Python Dictionaries Resize: The Full Expansion Process

The article explains Python's dict resizing mechanism, detailing when the hash table grows (when used entries reach two‑thirds of capacity), how the new size is chosen as the smallest power‑of‑two ≥ ma_used × 3, and the exact steps the interpreter takes to reallocate and rebuild the underlying key and entry arrays.

Satori Komeiji's Programming Classroom
Satori Komeiji's Programming Classroom
Satori Komeiji's Programming Classroom
How Python Dictionaries Resize: The Full Expansion Process

When a Python dictionary’s used entry count reaches two‑thirds of its total capacity, the hash table must resize. The new capacity must satisfy two conditions: it is at least ma_used * 3 and it is a power of two. The smallest value meeting both constraints is computed by calculate_log2_keysize, which returns the logarithm (base 2) of the required capacity.

The resize is triggered by insertion_resize, which forwards the calculated log‑size to dictresize. dictresize allocates a new PyDictKeysObject for the dictionary, checks that the new size does not exceed the maximum allowed ( SIZEOF_SIZE_T * 8), and determines whether the dictionary uses a split table (separate key and value arrays) or a combined table.

If the dictionary uses a split table, the resize converts it to a combined table. For a split‑to‑generic conversion, each surviving PyDictUnicodeEntry is copied into a new PyDictKeyEntry, the key reference is retained, the hash is recomputed, and the value pointer is transferred. The hash index array is rebuilt with build_indices_generic. For a split‑to‑unicode conversion, the process is similar but uses build_indices_unicode. After copying, the old ma_values array is freed.

If the dictionary already uses a combined table, the resize handles three sub‑cases:

Generic → Generic: if no entries were deleted, memcpy copies the old entries; otherwise, only entries with non‑NULL values are copied.

Unicode → Unicode: analogous to the generic case, with an optional fast path when entry counts match.

Unicode → Generic or Generic → Unicode: entries are transformed between 16‑byte and 24‑byte layouts by iterating and assigning fields individually.

After moving the live entries, the hash index array is rebuilt (either build_indices_generic or build_indices_unicode), the old key object is decref’ed, and, if a split table was present, the old value object is freed. Finally, dk_usable and dk_nentries are updated to reflect the new capacity, and consistency is asserted.

The article also contrasts the historical single‑array implementation (which stored both entries and the index, causing high memory waste) with the modern two‑array design that separates the hash index from the key‑value entries, reducing memory overhead. It illustrates the structures of combined and split tables with diagrams and shows how a dictionary with three items ( "a": 1, "b": 2, "c": 3) looks in each layout.

In summary, Python’s dict resize algorithm ensures memory efficiency by growing the hash table only when necessary, selecting an optimal power‑of‑two size, and carefully migrating live entries while preserving key order and rebuilding indices.

// Objects/dictobject.c
#define GROWTH_RATE(d) ((d)->ma_used*3)
// Objects/dictobject.c
static inline uint8_t calculate_log2_keysize(Py_ssize_t minsize) {
    uint8_t log2_size;
    for (log2_size = PyDict_LOG_MINSIZE;
         ((Py_ssize_t)1 << log2_size) < minsize;
         log2_size++)
        ;
    return log2_size;
}
// Objects/dictobject.c
static int insertion_resize(PyInterpreterState *interp, PyDictObject *mp, int unicode) {
    return dictresize(interp, mp,
        calculate_log2_keysize(GROWTH_RATE(mp)), unicode);
}
// Objects/dictobject.c
static int dictresize(PyInterpreterState *interp, PyDictObject *mp,
                     uint8_t log2_newsize, int unicode) {
    // ... (implementation as described above) ...
    return 0;
}
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory-managementPythonhash tabledictresizingcombined-tablesplit-table
Satori Komeiji's Programming Classroom
Written by

Satori Komeiji's Programming Classroom

Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.