Fundamentals 20 min read

How Does a Python dict Key Map to an Index and How Are Collisions Resolved?

The article explains how Python maps a dict key to a hash‑table slot, the two classic collision‑resolution strategies (separate chaining and open addressing), why Python uses an iterative probing function, and walks through the core C functions that perform the lookup and slot‑finding logic.

Satori Komeiji's Programming Classroom
Satori Komeiji's Programming Classroom
Satori Komeiji's Programming Classroom
How Does a Python dict Key Map to an Index and How Are Collisions Resolved?

When two objects have equal hash values they inevitably map to the same slot; when their hash values differ they can still map to the same slot because the hash table has far fewer slots than the hash space, causing an index (slot) conflict.

Two common ways to resolve conflicts

Separate chaining : multiple key‑value pairs that map to the same slot are linked together in a list.

Open addressing : Python’s dict uses this method; if a slot is occupied the algorithm probes for another slot.

Open addressing in CPython

Python’s implementation uses an iterative probing function that incorporates the key’s hash value instead of a fixed offset. The probing sequence is generated by repeatedly shifting a perturb value (initially the hash) and adding i*5 + 1, then masking with the table size.

#define PERTURB_SHIFT 5
size_t perturb = hash;
size_t i = hash & mask;
for (;;) {
    // compute new index when a conflict occurs
    perturb >>= PERTURB_SHIFT;
    i = mask & (i*5 + perturb + 1);
    // repeat until an empty slot is found
}

The main lookup routine is _Py_dict_lookup. It first determines the dictionary’s storage kind (unicode, general, or split) and then calls one of three helper functions: unicodekeys_lookup_unicode – searches a unicode‑only table. unicodekeys_lookup_generic – searches a table that may contain non‑string keys. dictkeys_generic_lookup – searches a generic table.

All three follow the same probing logic: compute an initial index i = hash & mask, retrieve the stored entry index ix from the hash‑index array, and if ix >= 0 compare the stored key with the target key. If the keys differ, a conflict is detected and the probing formula above is applied to obtain a new i.

Key helper functions

static Py_ssize_t dictkeys_get_index(const PyDictKeysObject *keys, Py_ssize_t i) {
    // Choose the correct integer width for the index array
    if (DK_LOG_SIZE(keys) < 8) {
        const int8_t *indices = (const int8_t *)keys->dk_indices;
        return indices[i];
    } else if (DK_LOG_SIZE(keys) < 16) {
        const int16_t *indices = (const int16_t *)keys->dk_indices;
        return indices[i];
    } else if (DK_LOG_SIZE(keys) >= 32) {
        const int64_t *indices = (const int64_t *)keys->dk_indices;
        return indices[i];
    } else {
        const int32_t *indices = (const int32_t *)keys->dk_indices;
        return indices[i];
    }
}

The lookup functions ultimately return ix, the index of the key‑value pair in the entry array. To obtain the actual slot index i, CPython uses two additional functions:

static Py_ssize_t find_empty_slot(PyDictKeysObject *keys, Py_hash_t hash) {
    size_t mask = DK_MASK(keys);
    size_t i = hash & mask;
    Py_ssize_t ix = dictkeys_get_index(keys, i);
    for (size_t perturb = hash; ix >= 0; ) {
        perturb >>= PERTURB_SHIFT;
        i = (i*5 + perturb + 1) & mask;
        ix = dictkeys_get_index(keys, i);
    }
    return i;  // empty slot index
}

static Py_ssize_t lookdict_index(PyDictKeysObject *k, Py_hash_t hash, Py_ssize_t index) {
    size_t mask = DK_MASK(k);
    size_t perturb = (size_t)hash;
    size_t i = (size_t)hash & mask;
    for (;;) {
        Py_ssize_t ix = dictkeys_get_index(k, i);
        if (ix == index) return i;          // found the slot
        if (ix == DKIX_EMPTY) return DKIX_EMPTY;
        perturb >>= PERTURB_SHIFT;
        i = (i*5 + perturb + 1) & mask;
    }
}

Thus, _Py_dict_lookup tells whether a key exists (by returning ix), while find_empty_slot and lookdict_index return the slot index i needed for insertion or deletion.

Summary of the probing outcome

If the key exists, ix >= 0 and the corresponding slot index i can be retrieved.

If the key does not exist, the lookup returns DKIX_EMPTY (-1) and find_empty_slot yields the first free slot.

Python’s iterative probing therefore avoids the simple linear or quadratic steps used in many textbooks; it mixes the hash value into the perturbation to spread probes more uniformly and reduce clustering.

Diagram of Python dict slot probing
Diagram of Python dict slot probing
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonc-languagehash tablecollision-resolutiondictopen-addressingprobing
Satori Komeiji's Programming Classroom
Written by

Satori Komeiji's Programming Classroom

Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.