Fundamentals 22 min read

Explore CPython Internals: Build and Understand Python’s Core

This comprehensive guide walks you through the CPython source tree, shows how to clone, configure, and compile version 3.8.0b3 on macOS, explains the language grammar, tokenization, memory management, reference counting, and garbage collection, and provides practical code snippets for each step.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Explore CPython Internals: Build and Understand Python’s Core

Introduction

This article is a long but useful guide for anyone who wants to learn CPython. It is split into five parts, each covering a different aspect of the interpreter.

Part 1 – Introducing CPython

CPython is the reference implementation of Python; other implementations include PyPy and Jython. The guide focuses on CPython source code version 3.8.0b3.

What’s in the source code?

The CPython repository contains documentation, grammar files, C header files, the standard library written in Python, C‑written modules, the parser, and build tools.

git clone https://github.com/python/cpython
cd cpython
git checkout v3.8.0b3

If Git is unavailable, download the ZIP archive from GitHub and extract it. The top‑level directory looks like:

cpython/
├── Doc        ← documentation
├── Grammar    ← language grammar
├── Include    ← C header files
├── Lib        ← Python standard library
├── Modules    ← C‑written standard modules
├── Objects    ← core object types
├── Parser     ← parser source
├── PC, PCbuild← Windows build support
├── Programs   ← executables
├── Python    ← interpreter source
└── Tools      ← auxiliary tools

Compiling CPython on macOS

Install the Xcode command‑line tools: xcode-select --install Install OpenSSL and other dependencies via Homebrew: brew install openssl xz zlib Configure the build, pointing to the Homebrew‑installed libraries:

CPPFLAGS="-I$(brew --prefix zlib)/include" \
LDFLAGS="-L$(brew --prefix zlib)/lib" \
./configure --with-openssl=$(brew --prefix openssl) --with-pydebug

Build the interpreter: make -j2 -s Run the resulting binary to verify the version:

./python.exe
Python 3.8.0b3 (tags/v3.8.0b3:4336222407, Aug 21 2019, 10:00:03) [Clang 10.0.1 (clang-1001.0.46.4)] on darwin
CPython build output
CPython build output

What does the compiler do?

The compiler translates Python source code into bytecode, a low‑level intermediate representation understood only by CPython. Bytecode is stored in .pyc files for faster subsequent execution.

Why is CPython written in C?

CPython’s core components, including the interpreter and many standard‑library modules, are implemented in C for performance and to provide low‑level access to operating‑system APIs. Some modules are pure Python or a mix of C and Python.

Python Language Specification

The language definition lives in Doc/reference as reStructuredText files such as compound_stmts.rst, datamodel.rst, and grammar.rst. These files are the source for the official Python reference guide at docs.python.org.

Grammar

The grammar is stored in Grammar/Grammar using Extended‑BNF. Example snippet for the with statement:

.. productionlist::
   with_stmt: "with" `with_item` ("," `with_item`)* ":" `suite`
   with_item: `expression` ["as" `target`]

Using pgen

pgen

reads the grammar file and generates parser tables. After modifying Grammar/Grammar or Grammar/Tokens, run: make regen-grammar Then rebuild CPython: make -j4 -s Running the new binary launches the REPL where you can test modified syntax.

Tokens

The Tokens file defines the lexical token names (e.g., LPAR, RPAR, COLON). The tokenize module (both a Python implementation in Lib/tokenize.py and a C implementation) can display token streams for a given source file.

Tokenize output
Tokenize output

Memory Management in Python

CPython uses two mechanisms: reference counting and a cyclic garbage collector. Memory for objects is allocated via PyArena structures defined in Python/pyarena.c. Objects are tracked in PyListObject lists, and large allocations are handled by PyMem_RawAlloc and PyMem_Realloc.

Reference Counting

Every object has a reference count. The macros Py_INCREF and Py_DECREF adjust this count. When the count drops to zero, PyObject_Free and PyArena_Free release the memory.

Garbage Collection

CPython’s cyclic garbage collector runs periodically to break reference cycles. It can be inspected via the gc module:

import gc
gc.set_debug(gc.DEBUG_STATS)
print(gc.get_threshold())
print(gc.get_count())
gc.collect()

Conclusion

The first part introduced the source tree, compilation steps, and the Python language specification. Understanding these fundamentals is essential for deeper exploration of the interpreter in subsequent parts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Compilationsource codegrammarCPythonPython interpreter
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.