Why Python Lists Aren’t Pre‑Allocated: Inside CPython’s Memory Management
Python’s list type appears flexible, but its underlying CPython implementation uses a dynamic resizing strategy without a pre‑allocated memory pool, allocating memory on demand via list_resize and PyMem_RESIZE, which this article explains through code analysis, memory size measurements, and practical recommendations.
Introduction
Python's list is a highly flexible array that can change length at will. Because of this convenience we often modify the list using insert, pop, or the more common append operations.
Example usage:
>> test = []
>>> test.append(1)
>>> test.append({2})
>>> test.append([3])
>>> print test
# output
[1, set([2]), [3]]Another example:
test = []
for i in range(4):
test.append(i)
print test
# output
[0, 1, 2, 3]While this feels pleasant, any scenario that dynamically changes the length of a list should immediately raise concerns about memory management.
Stingy Initialization
Inspired by pre‑allocation theory, one might assume that a list reserves a certain amount of memory on creation. In reality, Python lists are "low" on pre‑allocation.
Memory size experiment:
import sys
test = []
test_1 = [1]
print sys.getsizeof(test)
print sys.getsizeof(test_1) - sys.getsizeof(test)
# output
72 # size of empty list (total object size)
8 # size added by one element (pointer size)The CPython source shows that PyList_New allocates memory only for the requested number of elements, without a hidden pre‑allocation pool:
PyObject *
PyList_New(Py_ssize_t size)
{
PyListObject *op;
size_t nbytes;
if (size < 0) {
PyErr_BadInternalCall();
return NULL;
}
if ((size_t)size > PY_SIZE_MAX / sizeof(PyObject *))
return PyErr_NoMemory();
if (numfree) {
numfree--;
op = free_list[numfree];
_Py_NewReference((PyObject *)op);
} else {
op = PyObject_GC_New(PyListObject, &PyList_Type);
if (op == NULL)
return NULL;
}
nbytes = size * sizeof(PyObject *);
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
memset(op->ob_item, 0, nbytes);
}
Py_SIZE(op) = size;
op->allocated = size;
_PyObject_GC_TRACK(op);
return (PyObject *) op;
}When executing test = [1], two actions occur: (1) build an empty list of the appropriate length, and (2) fill it with the element using PyList_SET_ITEM:
#define PyList_SET_ITEM(op, i, v) (((PyListObject *)(op))->ob_item[i] = (v))Thus, list initialization does not involve a pre‑allocated memory pool; memory is allocated on demand.
Variable‑Length Mechanics
Every mutation— insert, pop, or append —invokes list_resize, which adjusts the list's memory footprint.
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated;
Py_ssize_t allocated = self->allocated;
if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
if (newsize == 0)
new_allocated = 0;
items = self->ob_item;
if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
PyMem_RESIZE(items, PyObject *, new_allocated);
else
items = NULL;
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
Py_SIZE(self) = newsize;
self->allocated = new_allocated;
return 0;
}In this function, newsize represents the total number of elements after the operation, while new_allocated is the new total capacity ("holes"). The list object stores both ob_size (actual element count) and allocated (capacity), with the invariant 0 <= ob_size <= allocated.
When list_resize runs, the new capacity is calculated as:
Determine a base: new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
Check whether new_allocated + newsize exceeds PY_SIZE_MAX and raise an error if so;
Set the final capacity to new_allocated + newsize (or 0 if newsize is 0).
Empirical test of memory growth during successive appends:
#coding: utf8
import sys
test = []
raw_size = sys.getsizeof(test)
for i in range(1, 7):
test.append(1)
print "%d 次 append 减去空列表的内存大小:%s " % (i, sys.getsizeof(test) - raw_size)
# output
1 次 append 减去空列表的内存大小:32
2 次 append 减去空列表的内存大小:32
3 次 append 减去空列表的内存大小:32
4 次 append 减去空列表的内存大小:32
5 次 append 减去空列表的内存大小:64
6 次 append 减去空列表的内存大小:64The table below shows when the internal allocated value changes during appends:
Append #
Original length
Added elements
Original allocated
newsize
new_allocated
1
0
1
0
0+1=1
3+1=4
2
1
1
4
1+1=2
No change
3
2
1
4
2+1=3
No change
4
3
1
4
3+1=4
No change
5
4
1
4
4+1=5
3+5=8
6
5
1
8
5+1=6
No change
Understanding when allocated grows explains why memory usage can jump after certain numbers of appends.
Best Practice: List Comprehensions
When the goal is simply to build a list and the loop body is straightforward, prefer a list comprehension over an explicit for loop with append: test = [i for i in xrange(4)] List comprehensions allocate the exact required size up front, avoiding repeated calls to PyMem_RESIZE and reducing bytecode overhead such as SETUP_LOOP and CALL_FUNCTION. Use them only when the logic is simple and the result is needed as a list.
Remember: choose the right tool for the job; indiscriminate use of list comprehensions can lead to less readable code.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
