Fundamentals 14 min read

How Python dict’s custom methods like get, setdefault, and popitem are implemented

This article dissects the CPython implementation of three dictionary methods—get, setdefault, and popitem—showing example usage, walking through the C source code, explaining hash look‑ups, entry handling, and the subtle state changes that occur when keys are missing or removed.

Satori Komeiji's Programming Classroom

Aug 26, 2024

How Python dict’s custom methods like get, setdefault, and popitem are implemented

Dictionary get method

The get method returns the value for a given key or a default value when the key is absent. Example usage demonstrates retrieving an existing key, a missing key returning None, and specifying a custom default.

#define DICT_GET_METHODDEF \
    {"get", _PyCFunction_CAST(dict_get), METH_FASTCALL, dict_get__doc__},

static PyObject *
 dict_get(PyDictObject *self, PyObject *const *args, Py_ssize_t nargs)
{
    // return value
    PyObject *return_value = NULL;
    // specified key
    PyObject *key;
    // default value, defaults to None
    PyObject *default_value = Py_None;
    // get accepts 1 or 2 positional arguments
    if (!_PyArg_CheckPositional("get", nargs, 1, 2)) {
        goto exit;
    }
    // args[0] is the key
    key = args[0];
    if (nargs < 2) {
        goto skip_optional;
    }
    // args[1] is the optional default value
    default_value = args[1];
skip_optional:
    return_value = dict_get_impl(self, key, default_value);
exit:
    return return_value;
}

static PyObject *
 dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{
    PyObject *val = NULL;
    Py_hash_t hash;  // hash value
    Py_ssize_t ix;   // index in the hash table
    // compute hash
    if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return NULL;
    }
    // look up the key in the hash table
    ix = _Py_dict_lookup(self, key, hash, &val);
    if (ix == DKIX_ERROR)
        return NULL;
    // key not found → use default value
    if (ix == DKIX_EMPTY || val == NULL) {
        val = default_value;
    }
    // increase reference count and return
    return Py_NewRef(val);
}

Dictionary setdefault method

The setdefault method behaves like get but inserts the key with the default value when the key is missing. The article shows that setdefault can be used to build nested dictionaries efficiently.

#define DICT_SETDEFAULT_METHODDEF \
    {"setdefault", _PyCFunction_CAST(dict_setdefault), METH_FASTCALL, dict_setdefault__doc_},

static PyObject *
 dict_setdefault(PyDictObject *self, PyObject *const *args, Py_ssize_t nargs)
{
    // return value
    PyObject *return_value = NULL;
    PyObject *key;
    PyObject *default_value = Py_None;
    if (!_PyArg_CheckPositional("setdefault", nargs, 1, 2)) {
        goto exit;
    }
    key = args[0];
    if (nargs < 2) {
        goto skip_optional;
    }
    default_value = args[1];
skip_optional:
    return_value = dict_setdefault_impl(self, key, default_value);
exit:
    return return_value;
}

static PyObject *
 dict_setdefault_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{
    PyObject *val;
    val = PyDict_SetDefault((PyObject *)self, key, default_value);
    return Py_XNewRef(val);
}

Example code demonstrates that when the key exists, setdefault returns the existing value; when the key does not exist, it inserts the key with the supplied default and returns that default. A practical use‑case builds a nested mapping of names, years, and counts using chained setdefault calls.

Dictionary popitem method

The popitem method removes and returns the last key‑value pair in the dictionary. The article contrasts it with pop, which removes a specific key, and provides a C‑level implementation that handles empty dictionaries, split tables, and updates internal version counters.

#define DICT_POPITEM_METHODDEF \
    {"popitem", (PyCFunction)dict_popitem, METH_NOARGS, dict_popitem__doc_},

static PyObject *
 dict_popitem(PyDictObject *self, PyObject *Py_UNUSED(ignored))
{
    return dict_popitem_impl(self);
}

static PyObject *
 dict_popitem_impl(PyDictObject *self)
{
    Py_ssize_t i, j;
    PyObject *res;
    uint64_t new_version;
    PyInterpreterState *interp = _PyInterpreterState_GET();
    // result tuple (key, value)
    res = PyTuple_New(2);
    if (res == NULL)
        return NULL;
    // empty dictionary → raise KeyError
    if (self->ma_used == 0) {
        Py_DECREF(res);
        PyErr_SetString(PyExc_KeyError, "popitem(): dictionary is empty");
        return NULL;
    }
    // if using split table, resize to combined table after pop
    if (self->ma_keys->dk_kind == DICT_KEYS_SPLIT) {
        if (dictresize(interp, self, DK_LOG_SIZE(self->ma_keys), 1)) {
            Py_DECREF(res);
            return NULL;
        }
    }
    // reset version because key set changes
    self->ma_keys->dk_version = 0;

    PyObject *key, *value;
    Py_hash_t hash;
    if (DK_IS_UNICODE(self->ma_keys)) {
        // keys are all strings
        PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(self->ma_keys);
        i = self->ma_keys->dk_nentries - 1;
        // find the last entry with a non‑NULL value
        while (i >= 0 && ep0[i].me_value == NULL) {
            i--;
        }
        assert(i >= 0);
        key = ep0[i].me_key;
        new_version = _PyDict_NotifyEvent(interp, PyDict_EVENT_DELETED, self, key, NULL);
        hash = unicode_get_hash(key);
        value = ep0[i].me_value;
        // clear the entry
        ep0[i].me_key = NULL;
        ep0[i].me_value = NULL;
    } else {
        // mixed key types
        PyDictKeyEntry *ep0 = DK_ENTRIES(self->ma_keys);
        i = self->ma_keys->dk_nentries - 1;
        while (i >= 0 && ep0[i].me_value == NULL) {
            i--;
        }
        assert(i >= 0);
        key = ep0[i].me_key;
        new_version = _PyDict_NotifyEvent(interp, PyDict_EVENT_DELETED, self, key, NULL);
        hash = ep0[i].me_hash;
        value = ep0[i].me_value;
        ep0[i].me_key = NULL;
        ep0[i].me_hash = -1;
        ep0[i].me_value = NULL;
    }
    // mark the hash slot as dummy
    j = lookdict_index(self->ma_keys, hash, i);
    assert(j >= 0);
    assert(dictkeys_get_index(self->ma_keys, j) == i);
    dictkeys_set_index(self->ma_keys, j, DKIX_DUMMY);
    // build result tuple
    PyTuple_SET_ITEM(res, 0, key);
    PyTuple_SET_ITEM(res, 1, value);
    // update internal counters
    self->ma_keys->dk_nentries = i;
    self->ma_used--;
    self->ma_version_tag = new_version;
    ASSERT_CONSISTENT(self);
    return res;
}

The implementation shows that popitem walks backward from the last entry, skips entries that have been logically deleted (their value is NULL), removes the found entry, updates the hash‑table index to a dummy slot, and adjusts the dictionary’s size and version counters. The article also explains why the internal field dk_nentries is set to the index of the last live entry, allowing the freed slots to be reused for future insertions.

Conclusion

The three methods share a common pattern: argument validation, hash computation, lookup in the internal table, handling of missing keys, and careful reference‑count management. Understanding these C‑level details clarifies the observable Python behaviour and helps developers write more efficient code when manipulating dictionaries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data structures implementation get dict CPython popitem setdefault

Written by

Satori Komeiji's Programming Classroom

Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.