Fundamentals 14 min read

Understanding Python .pyc Files: How They Are Generated, Structured, and Imported

This article explains why and when Python creates .pyc files, shows how to compile them manually with py_compile, demonstrates importing compiled bytecode via importlib, dissects the .pyc file header and payload, and explores the CPython marshal implementation and simple bytecode obfuscation techniques.

Satori Komeiji's Programming Classroom
Satori Komeiji's Programming Classroom
Satori Komeiji's Programming Classroom
Understanding Python .pyc Files: How They Are Generated, Structured, and Imported

Triggering .pyc Generation

When a Python script is executed, the interpreter first compiles the source into a PyCodeObject and, if appropriate, writes it to a .pyc file. An import abc statement forces the creation of abc.pyc because the import machinery looks for a compiled version before loading the source.

Manual Compilation with py_compile

Any .py file can be compiled explicitly:

import py_compile
py_compile.compile("tools.py")

The compiled file appears in __pycache__/tools.cpython-312.pyc (the name encodes the interpreter version).

Importing a .pyc File Directly

Using importlib.machinery.SourcelessFileLoader you can load a compiled module without the source:

from importlib.machinery import SourcelessFileLoader
tools = SourcelessFileLoader(
    "tools",
    "__pycache__/tools.cpython-312.pyc"
).load_module()
print(tools.a)  # 1
print(tools.b)  # 你好啊

What a .pyc File Contains

Magic number : a version‑specific integer that ensures the interpreter loads a compatible file.

Timestamp : the source file’s last‑modification time; if the .pyc is older than the source, recompilation occurs.

Source size : the byte length of the original .py file.

Serialized PyCodeObject : the compiled bytecode and associated metadata, written using the marshal format.

Reading the header demonstrates these fields:

from importlib.util import MAGIC_NUMBER
import struct
with open("__pycache__/tools.cpython-312.pyc", "rb") as f:
    magic = f.read(4)
    timestamp = struct.unpack("<I", f.read(4))[0]
    size = struct.unpack("<I", f.read(4))[0]
print(MAGIC_NUMBER)   # b'\xcb\r
'
print(magic)            # b'\xcb\r
'
print(timestamp)        # e.g., 1726742711
print(size)             # e.g., 22

How CPython Writes a .pyc File

The core logic lives in Python/marshal.c. The function PyMarshal_WriteLongToFile writes the magic number, timestamp, and source size by converting a 32‑bit integer into four bytes via w_long, which in turn calls w_byte for each byte.

void PyMarshal_WriteLongToFile(long x, FILE *fp, int version) {
    char buf[4];
    WFILE wf;
    memset(&wf, 0, sizeof(wf));
    wf.fp = fp;
    wf.ptr = wf.buf = buf;
    wf.end = wf.ptr + sizeof(buf);
    wf.version = version;
    w_long(x, &wf);
    w_flush(&wf);
}

After the header, PyMarshal_WriteObjectToFile serializes the PyCodeObject by delegating to w_object, which dispatches based on the object's concrete type.

void PyMarshal_WriteObjectToFile(PyObject *x, FILE *fp, int version) {
    char buf[BUFSIZ];
    WFILE wf;
    if (PySys_Audit("marshal.dumps", "Oi", x, version) < 0) return;
    memset(&wf, 0, sizeof(wf));
    wf.fp = fp;
    wf.ptr = wf.buf = buf;
    wf.end = wf.ptr + sizeof(buf);
    wf.version = version;
    if (w_init_refs(&wf, version)) return;
    w_object(x, &wf);
    w_clear_refs(&wf);
    w_flush(&wf);
}

Serializing Different Object Types

The helper w_object checks the object against a series of exact‑type tests (None, True, False, integers, floats, strings, tuples, lists, dicts, etc.). For compound objects it writes a type tag via W_TYPE and then recursively serializes the contents. Example for a list:

W_TYPE(TYPE_LIST, p);
n = PyList_GET_SIZE(v);
W_SIZE(n, p);
for (i = 0; i < n; i++) {
    w_object(PyList_GET_ITEM(v, i), p);
}

And for a dict:

W_TYPE(TYPE_DICT, p);
pos = 0;
while (PyDict_Next(v, &pos, &key, &value)) {
    w_object(key, p);
    w_object(value, p);
}
w_object((PyObject *)NULL, p);  // NULL terminator

Every object is preceded by its type identifier (e.g., TYPE_LIST, TYPE_DICT) so that the loader can reconstruct the original structure.

Bytecode Obfuscation

Because .pyc files can be decompiled, a simple obfuscation technique inserts illegal bytecode instructions (e.g., loading a constant with an out‑of‑range index) followed by jump instructions that skip the offending bytes during execution. The interpreter ignores the illegal instructions due to the jump, but decompilers that attempt to decode every opcode will raise errors.

For stronger protection, the article recommends compiling the code with Cython to produce a binary .pyd module.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonbytecodeobfuscationpy_compileimportlibpycmarshal
Satori Komeiji's Programming Classroom
Written by

Satori Komeiji's Programming Classroom

Python and Rust developer; I write about any topics you're interested in. Follow me! (#^.^#)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.