Fundamentals 20 min read

Why NumPy Arrays Outperform Python Lists: Memory Model, Strides, and Views Explained

This article explores NumPy arrays' internal memory layout, data structures, and design choices—covering contiguous storage, strides, C/F contiguous layouts, views versus copies, and powerful indexing and slicing techniques—to reveal why they are dramatically faster than Python lists.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Why NumPy Arrays Outperform Python Lists: Memory Model, Strides, and Views Explained

2. NumPy Array Basics

If you ask a Python developer why NumPy arrays are faster than Python lists, they often say it’s because NumPy is implemented in C. While true, the real reason is that NumPy stores homogeneous data in a contiguous memory block, enabling O(1) element access.

2.1 Array Data Structure

Assume an array of eight elements, each 8 bytes (e.g., int64). The array occupies a single continuous block of memory, so the address of element i is calculated as

base_address + i * element_size

, giving constant‑time access.

This operation has O(1) time complexity, independent of array size.

If you’re preparing for an interview, this is a common question. 😅

2.2 Python Lists?

Python lists are also arrays, but they store pointers to objects rather than the raw data. This allows elements of different types, but each pointer has the same size, so the list is essentially an array of object references.

# homogeneous list of integers
my_list = [0, 1, 2, 3, 4, 5, 6]
# heterogeneous list containing various types
my_fancy_list = ["Sascha", True, 42, 3.1415, None, False, lambda x: x**2]

Because the list stores pointers, accessing an element still costs O(1), but a second indirection is required to retrieve the actual object.

All pointers have the same size and are stored contiguously, so list element access remains O(1). However, the actual objects may reside anywhere in memory, making cache utilization poorer than for NumPy arrays.

The advantage of contiguous storage lies in CPU cache behavior and vectorization.

2.3 Why Are Arrays Faster Than Python Lists?

CPU caches hold recently accessed memory lines (typically 32 bytes). When data is stored contiguously, a cache line brings many successive elements into the cache, enabling rapid subsequent accesses. Python list elements are scattered, causing many cache misses.

Thus, iterating over a NumPy array benefits from high cache‑hit rates, while a Python list suffers from frequent cache misses, explaining the performance gap.

3. NumPy ndarray Object

The core of NumPy is the

ndarray

, a homogeneous N‑dimensional array that stores data in a contiguous buffer and provides rich metadata.

3.1 What Is an ndarray?

An

ndarray

is a multi‑dimensional (N‑D) homogeneous array where each element has a fixed size (e.g., int64, float32). The array’s metadata includes

shape

,

strides

,

itemsize

, and

data

, which are crucial for performance.

Consider a simple one‑dimensional array:

The array

x

contains eight

int64

elements (itemsize = 8 bytes) with shape

(8,)

. Elements are stored in consecutive memory locations, and the

strides

attribute tells how many bytes to skip to move to the next element along each axis.

The

data

attribute returns a memory‑view object pointing to the start of the buffer.

3.2 Multi‑dimensional Arrays, Strides, and Shape

Multi‑dimensional data is also stored in a single contiguous block; the shape and strides metadata tell NumPy how to interpret the linear memory as an N‑dimensional array.

Observe how strides and shape change when moving from 1‑D to 2‑D and 3‑D arrays.

Notice that the memory layout is identical across dimensions.

The lowest‑indexed axis has the largest stride; the highest‑indexed axis has the smallest stride, reflecting NumPy’s default C‑contiguous (row‑major) layout.

NumPy supports both C‑contiguous and Fortran‑contiguous (F‑contiguous) memory layouts.

3.3 C‑contiguous vs. F‑contiguous

In a C‑contiguous array the last axis varies fastest; in an F‑contiguous array the first axis varies fastest. Transposing a C‑contiguous array yields an F‑contiguous view.

NumPy tracks layout with the

flags

attribute, indicating whether an array is C‑contiguous, F‑contiguous, or neither.

import numpy as np
x = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
print(x.flags)
# C_CONTIGUOUS : True   <---
# F_CONTIGUOUS : False  <---
# OWNDATA : True

y = x.transpose()
print(y.flags)
# C_CONTIGUOUS : False   <---
# F_CONTIGUOUS : True    <---
# OWNDATA : False
print(y.strides)
# (8, 24)
print(y.base)
# array([[ 0,  1,  2],
#        [ 3,  4,  5],
#        [ 6,  7,  8],
#        [ 9, 10, 11]])
print(id(y.base) == id(x))
# True
# --------------------------
# ---- 创建副本 ----
# --------------------------
y_copy = x.transpose().copy()
print(y_copy.flags)
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False   <---
# OWNDATA : True    <---
print(y_copy.strides)
# (32, 8)
print(y_copy.base is None)
# True

3.4 Why Contiguous Memory Matters

When data is contiguous, the CPU can load whole cache lines that contain many successive elements, reducing memory‑bandwidth pressure. Many NumPy functions (e.g.,

np.min

,

np.max

,

np.sum

) assume C‑contiguous layout for optimal performance.

Functions like

np.ravel

and

np.reshape

try to return a view of the original data; if the layout is not contiguous, they fall back to creating a copy.

4. Views and Copies

NumPy distinguishes between views and copies. A view shares the same data buffer as the original array and only modifies metadata; changes to a view affect the original array.

A copy allocates new memory and copies the data, so modifications are independent but cost more memory.

NumPy’s

base

attribute tells whether an array is a view (

base

points to the original) or a copy (

base

is

None

). The

OWNDATA

flag indicates ownership of the data buffer.

5. Indexing and Slicing

NumPy’s powerful indexing and slicing mechanisms let you retrieve single elements, sub‑arrays, or arbitrarily complex selections.

5.1 Basic Indexing

Retrieving a single element is an O(1) operation: the address is computed as

base + index * itemsize

.

5.2 Slicing

Slicing returns a sub‑array. For a 1‑D slice, the start index is offset by

start * itemsize

, and the step determines how many bytes to skip between elements.

In higher dimensions, strides for each axis are used to compute addresses.

5.3 Advanced Indexing

Advanced indexing uses integer or boolean arrays to select arbitrary elements, producing a copy rather than a view.

import numpy as np
x = np.array([0,1,2,3,4,5,6,7])
# standard slicing
y1 = x[::2]
# advanced indexing with integer sequence
y2 = x[[0,2,4,6]]
y2 = x[np.array([0,2,4,6]])
# boolean indexing
y3 = x[[True, False, True, False, True, False, True, False]]
y3 = x[x % 2 == 0]
print(y1)  # [0 2 4 6]
print(y2)  # [0 2 4 6]
print(y3)  # [0 2 4 6]
print(y1.flags)
# C_CONTIGUOUS: False
# OWNDATA: False
print(y2.flags)
# C_CONTIGUOUS: True
# OWNDATA: True
print(y3.flags)
# C_CONTIGUOUS: True
# OWNDATA: True

Standard slicing returns a view; advanced indexing returns a C‑contiguous copy.

6. Summary of Part One

In this first part we introduced NumPy arrays, explained their contiguous memory layout, and showed why they outperform Python lists. We examined the

ndarray

metadata (

shape

,

strides

,

itemsize

,

data

), the distinction between views and copies, C‑ vs. F‑contiguous layouts, and the powerful indexing and slicing capabilities. The next part will dive deeper into NumPy’s internal mechanisms for avoiding copies and saving memory.

performancepythonNumPyslicingviewsarray memoryndarray
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.