How Python’s dict Uses Hash Tables and Open Addressing Explained
This article explains how Python implements dictionaries with hash tables, details the hash function for strings, demonstrates collision handling via open addressing and quadratic probing, and walks through the underlying C structures, initialization, insertion, resizing, and deletion processes.
Dictionary Basics
In Python, a dictionary is an associative array accessed by keys, which can be viewed as two linked arrays. Adding three key/value pairs illustrates basic insertion and the resulting KeyError when accessing a non‑existent key.
Hash Tables
Python dictionaries are built on hash tables: keys are processed by a hash function to produce an index in an underlying array. The hash function aims to distribute keys uniformly, though collisions can occur when different keys share the same hash value.
For string keys, Python uses a simple hash function; for example, hash('a') on a 64‑bit platform yields 12416037344. With an array length of 8, the index is computed as hash('a') & 7 = 0. Keys 'b' and 'z' both map to index 3, demonstrating a collision.
Open Addressing
Python resolves collisions with open addressing, employing a quadratic probing sequence. The probing algorithm repeatedly computes 5*j + 1 to generate candidate slots, using a perturb variable to vary the hash bits.
When the array length is 32, the probe sequence for j progresses as 3 → 11 → 19 → 29 → 5 → 6 → 16 → 31 → 28 → 13 → 2 …
C Implementation of dict
The underlying C structure stores each entry’s hash, key, and value. Important fields include ma_fill (active + dummy slots), ma_used (active slots), ma_mask (array length‑1 for indexing), ma_table (the array), and ma_smalltable (initial 8‑slot array).
Dictionary Initialization
When a dict is first created, PyDict_New() allocates the initial table. The pseudo‑code highlights key steps such as setting the mask and initializing counters.
Adding Items
Insertion uses PyDict_SetItem(), which checks that the key is a string, computes its hash, and calls insertdict(). If active slots exceed two‑thirds of the array, the table is resized to maintain O(1) lookup performance.
The insertdict() function relies on lookdict_string() to find a free slot, applying the same hash‑and‑mask calculation and, if necessary, the quadratic probing sequence. The first probe that encounters a dummy slot prefers that slot for reuse.
Resizing
When more than 2/3 of the slots are occupied, dictresize() allocates a larger array—at least four times the number of active slots for moderate sizes, or twice for very large tables—to reduce future collisions. The new length is chosen by repeatedly doubling until it exceeds the required minimum (e.g., 8 → 16 → 32).
Deleting Items
Deletion calls PyDict_DelItem(), which computes the key’s hash, locates the entry via lookdict_string(), and marks the slot as a dummy. The table does not shrink immediately; however, subsequent insertions may trigger a resize if the combined count of active and dummy slots crosses the threshold.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
