How Python dict’s custom methods like get, setdefault, and popitem are implemented
This article dissects the CPython implementation of three dictionary methods—get, setdefault, and popitem—showing example usage, walking through the C source code, explaining hash look‑ups, entry handling, and the subtle state changes that occur when keys are missing or removed.
Dictionary get method
The get method returns the value for a given key or a default value when the key is absent. Example usage demonstrates retrieving an existing key, a missing key returning None, and specifying a custom default.
#define DICT_GET_METHODDEF \
{"get", _PyCFunction_CAST(dict_get), METH_FASTCALL, dict_get__doc__},
static PyObject *
dict_get(PyDictObject *self, PyObject *const *args, Py_ssize_t nargs)
{
// return value
PyObject *return_value = NULL;
// specified key
PyObject *key;
// default value, defaults to None
PyObject *default_value = Py_None;
// get accepts 1 or 2 positional arguments
if (!_PyArg_CheckPositional("get", nargs, 1, 2)) {
goto exit;
}
// args[0] is the key
key = args[0];
if (nargs < 2) {
goto skip_optional;
}
// args[1] is the optional default value
default_value = args[1];
skip_optional:
return_value = dict_get_impl(self, key, default_value);
exit:
return return_value;
}
static PyObject *
dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{
PyObject *val = NULL;
Py_hash_t hash; // hash value
Py_ssize_t ix; // index in the hash table
// compute hash
if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
}
// look up the key in the hash table
ix = _Py_dict_lookup(self, key, hash, &val);
if (ix == DKIX_ERROR)
return NULL;
// key not found → use default value
if (ix == DKIX_EMPTY || val == NULL) {
val = default_value;
}
// increase reference count and return
return Py_NewRef(val);
}Dictionary setdefault method
The setdefault method behaves like get but inserts the key with the default value when the key is missing. The article shows that setdefault can be used to build nested dictionaries efficiently.
#define DICT_SETDEFAULT_METHODDEF \
{"setdefault", _PyCFunction_CAST(dict_setdefault), METH_FASTCALL, dict_setdefault__doc_},
static PyObject *
dict_setdefault(PyDictObject *self, PyObject *const *args, Py_ssize_t nargs)
{
// return value
PyObject *return_value = NULL;
PyObject *key;
PyObject *default_value = Py_None;
if (!_PyArg_CheckPositional("setdefault", nargs, 1, 2)) {
goto exit;
}
key = args[0];
if (nargs < 2) {
goto skip_optional;
}
default_value = args[1];
skip_optional:
return_value = dict_setdefault_impl(self, key, default_value);
exit:
return return_value;
}
static PyObject *
dict_setdefault_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{
PyObject *val;
val = PyDict_SetDefault((PyObject *)self, key, default_value);
return Py_XNewRef(val);
}Example code demonstrates that when the key exists, setdefault returns the existing value; when the key does not exist, it inserts the key with the supplied default and returns that default. A practical use‑case builds a nested mapping of names, years, and counts using chained setdefault calls.
Dictionary popitem method
The popitem method removes and returns the last key‑value pair in the dictionary. The article contrasts it with pop, which removes a specific key, and provides a C‑level implementation that handles empty dictionaries, split tables, and updates internal version counters.
#define DICT_POPITEM_METHODDEF \
{"popitem", (PyCFunction)dict_popitem, METH_NOARGS, dict_popitem__doc_},
static PyObject *
dict_popitem(PyDictObject *self, PyObject *Py_UNUSED(ignored))
{
return dict_popitem_impl(self);
}
static PyObject *
dict_popitem_impl(PyDictObject *self)
{
Py_ssize_t i, j;
PyObject *res;
uint64_t new_version;
PyInterpreterState *interp = _PyInterpreterState_GET();
// result tuple (key, value)
res = PyTuple_New(2);
if (res == NULL)
return NULL;
// empty dictionary → raise KeyError
if (self->ma_used == 0) {
Py_DECREF(res);
PyErr_SetString(PyExc_KeyError, "popitem(): dictionary is empty");
return NULL;
}
// if using split table, resize to combined table after pop
if (self->ma_keys->dk_kind == DICT_KEYS_SPLIT) {
if (dictresize(interp, self, DK_LOG_SIZE(self->ma_keys), 1)) {
Py_DECREF(res);
return NULL;
}
}
// reset version because key set changes
self->ma_keys->dk_version = 0;
PyObject *key, *value;
Py_hash_t hash;
if (DK_IS_UNICODE(self->ma_keys)) {
// keys are all strings
PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(self->ma_keys);
i = self->ma_keys->dk_nentries - 1;
// find the last entry with a non‑NULL value
while (i >= 0 && ep0[i].me_value == NULL) {
i--;
}
assert(i >= 0);
key = ep0[i].me_key;
new_version = _PyDict_NotifyEvent(interp, PyDict_EVENT_DELETED, self, key, NULL);
hash = unicode_get_hash(key);
value = ep0[i].me_value;
// clear the entry
ep0[i].me_key = NULL;
ep0[i].me_value = NULL;
} else {
// mixed key types
PyDictKeyEntry *ep0 = DK_ENTRIES(self->ma_keys);
i = self->ma_keys->dk_nentries - 1;
while (i >= 0 && ep0[i].me_value == NULL) {
i--;
}
assert(i >= 0);
key = ep0[i].me_key;
new_version = _PyDict_NotifyEvent(interp, PyDict_EVENT_DELETED, self, key, NULL);
hash = ep0[i].me_hash;
value = ep0[i].me_value;
ep0[i].me_key = NULL;
ep0[i].me_hash = -1;
ep0[i].me_value = NULL;
}
// mark the hash slot as dummy
j = lookdict_index(self->ma_keys, hash, i);
assert(j >= 0);
assert(dictkeys_get_index(self->ma_keys, j) == i);
dictkeys_set_index(self->ma_keys, j, DKIX_DUMMY);
// build result tuple
PyTuple_SET_ITEM(res, 0, key);
PyTuple_SET_ITEM(res, 1, value);
// update internal counters
self->ma_keys->dk_nentries = i;
self->ma_used--;
self->ma_version_tag = new_version;
ASSERT_CONSISTENT(self);
return res;
}The implementation shows that popitem walks backward from the last entry, skips entries that have been logically deleted (their value is NULL), removes the found entry, updates the hash‑table index to a dummy slot, and adjusts the dictionary’s size and version counters. The article also explains why the internal field dk_nentries is set to the index of the last live entry, allowing the freed slots to be reused for future insertions.
Conclusion
The three methods share a common pattern: argument validation, hash computation, lookup in the internal table, handling of missing keys, and careful reference‑count management. Understanding these C‑level details clarifies the observable Python behaviour and helps developers write more efficient code when manipulating dictionaries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Satori Komeiji's Programming Classroom
Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
