How Python Creates Dictionaries and Implements Their Operations
This article dissects CPython's dictionary implementation, explaining how a PyDictObject and its PyDictKeysObject are allocated, how method groups map to Python magic methods, and the detailed C logic for inserting, updating, and retrieving key‑value pairs, including __missing__ handling.
Dictionary creation
The interpreter creates a new dictionary via PyDict_New, which calls new_dict with the static Py_EMPTY_KEYS structure. The initial capacity is eight slots.
static PyDictKeysObject empty_keys_struct = { /* fields omitted */ };
#define Py_EMPTY_KEYS &empty_keys_struct
PyObject *PyDict_New(void) {
PyInterpreterState *interp = _PyInterpreterState_GET();
return new_dict(interp, Py_EMPTY_KEYS, NULL, 0, 0);
} new_dictallocates a PyDictObject, links its ma_keys field to a PyDictKeysObject, and initializes bookkeeping fields such as ma_used and ma_version_tag.
PyDictKeysObject allocation
The keys object is created by new_keys_object. It determines the entry size (16 bytes for all‑unicode keys, 24 bytes otherwise), computes the usable fraction of the hash table, selects the index‑byte size ( log2_bytes), allocates memory for the PyDictKeysObject plus the hash‑index and entry arrays, and initializes fields.
static PyDictKeysObject *new_keys_object(PyInterpreterState *interp,
uint8_t log2_size,
bool unicode) {
PyDictKeysObject *dk;
Py_ssize_t usable;
int log2_bytes;
size_t entry_size = unicode ? sizeof(PyDictUnicodeEntry)
: sizeof(PyDictKeyEntry);
// compute usable and log2_bytes based on log2_size (omitted for brevity)
dk = PyObject_Malloc(sizeof(PyDictKeysObject) + (size_t)1<<log2_bytes + entry_size*usable);
if (dk == NULL) { PyErr_NoMemory(); return NULL; }
dk->dk_refcnt = 1;
dk->dk_log2_size = log2_size;
dk->dk_log2_index_bytes = log2_bytes;
dk->dk_kind = unicode ? DICT_KEYS_UNICODE : DICT_KEYS_GENERAL;
dk->dk_nentries = 0;
dk->dk_usable = usable;
memset(&dk->dk_indices[0], 0xff, (size_t)1<<log2_bytes);
return dk;
}Method tables
Python objects expose three method groups. For dictionaries the relevant C structures are:
static PyNumberMethods dict_as_number = {
.nb_or = dict_or,
.nb_inplace_or = dict_ior,
};
static PySequenceMethods dict_as_sequence = {
.sq_contains = PyDict_Contains,
};
static PyMappingMethods dict_as_mapping = {
.mp_length = (lenfunc)dict_length,
.mp_subscript = (binaryfunc)dict_subscript,
.mp_ass_subscript = (objobjargproc)dict_ass_sub,
};Setting a key‑value pair
Assignment such as d["a"] = 1 invokes dict_ass_sub, which forwards to PyDict_SetItem (or PyDict_DelItem when the value is NULL). PyDict_SetItem increments reference counts and calls _PyDict_SetItem_Take2, which computes the hash if necessary and then calls either insert_to_emptydict or insertdict depending on whether the dictionary is empty.
static int dict_ass_sub(PyDictObject *mp, PyObject *v, PyObject *w) {
if (w == NULL)
return PyDict_DelItem((PyObject *)mp, v);
else
return PyDict_SetItem((PyObject *)mp, v, w);
}
int PyDict_SetItem(PyObject *op, PyObject *key, PyObject *value) {
assert(key && value);
return _PyDict_SetItem_Take2((PyDictObject *)op, Py_NewRef(key), Py_NewRef(value));
}Insertion logic (insertdict)
If the existing keys are all Unicode and the new key is not, the table is resized to a general‑key table.
The key is looked up with _Py_dict_lookup to obtain a slot index.
If the slot is empty ( DKIX_EMPTY), the dictionary may be resized, an empty slot is found, dk_indices is updated, and the key/value are written into a PyDictUnicodeEntry or PyDictKeyEntry (handling split tables if present).
If the key already exists, the stored value is replaced and the dictionary version is updated.
static int insertdict(PyInterpreterState *interp, PyDictObject *mp,
PyObject *key, Py_hash_t hash, PyObject *value) {
PyObject *old_value;
Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &old_value);
if (ix == DKIX_ERROR) goto Fail;
if (ix == DKIX_EMPTY) {
// possible resize, find empty slot, set entry fields
// update ma_used, ma_version_tag, dk_usable, dk_nentries
return 0;
}
// key exists – replace value if different
if (old_value != value) {
uint64_t new_version = _PyDict_NotifyEvent(interp, PyDict_EVENT_MODIFIED, mp, key, value);
// update split‑table or combined entry
mp->ma_version_tag = new_version;
}
Py_XDECREF(old_value);
return 0;
Fail:
Py_DECREF(value);
Py_DECREF(key);
return -1;
}Retrieving a value (dict_subscript)
Subscript access such as v = d["a"] calls dict_subscript. It computes the hash if needed, looks up the key, and returns a new reference to the value. If the key is missing, it checks for a subclass‑defined __missing__ method; otherwise it raises KeyError.
static PyObject *dict_subscript(PyDictObject *mp, PyObject *key) {
Py_hash_t hash;
if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1) return NULL;
}
PyObject *value;
Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &value);
if (ix == DKIX_ERROR) return NULL;
if (ix == DKIX_EMPTY || value == NULL) {
PyObject *missing = _PyObject_LookupSpecial((PyObject *)mp, &_Py_ID(__missing__));
if (missing) {
PyObject *res = PyObject_CallOneArg(missing, key);
Py_DECREF(missing);
return res;
}
_PyErr_SetKeyError(key);
return NULL;
}
return Py_NewRef(value);
}__missing__ hook example
class Dict(dict):
def __missing__(self, key):
return f"missing key: {key}"
d = Dict({"a": 1, "b": 2})
print(d["a"]) # 1
print(d["c"]) # missing key: cThe analysis covers the creation of PyDictObject and PyDictKeysObject, the method‑table definitions that map C functions to Python magic methods, and the concrete insertion and lookup paths used by CPython dictionaries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Satori Komeiji's Programming Classroom
Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
