How Codex CLI Compresses Context: Inside the compact() API
The article dissects Codex CLI's two compression paths—local LLM summarization for non‑Codex models and an encrypted compact() API for Codex models—by injecting prompts, extracting system, handoff, and compression prompts, and comparing them with open‑source references to reveal the underlying mechanism.
1. Two Compression Schemes in Codex CLI
For non‑Codex models, the CLI performs context compression locally by using an LLM to summarize the conversation with a compaction prompt; the resulting summary is fed to responses.create() together with a handoff prompt that explains the summary. For Codex models, the CLI calls the compact() API, which returns an encrypted blob whose internal processing is opaque.
2. Reverse‑Engineering the Compression Mechanism
2.1 Step One: Call compact()
The author sends a specially crafted user message to compact(). On the server side a “compactor” LLM reads both its hidden system prompt and the injected payload, causing the LLM to emit its own system prompt in the plaintext summary. This summary is then encrypted with AES, producing a blob that can only be decrypted on OpenAI’s servers.
2.2 Step Two: Call responses.create()
The encrypted blob and a second user message are passed to responses.create(). The server decrypts the blob, assembles the model’s context, and includes the original compression prompt (if the injection succeeded) plus an additional handoff prompt.
If the injection worked, the model’s output will contain three distinct prompts: the system prompt, the handoff prompt, and the compression prompt. Running extract_prompts.py on the raw response produces a color‑coded dump where yellow marks the system prompt, green the handoff prompt, and pink the compression prompt.
3. Verifying the Prompts Are Real
The extracted compression and handoff prompts are compared with the known prompts in the open‑source Codex CLI repository (files prompt.md and summary_prefix.md). Their high similarity indicates the extracted prompts are genuine rather than hallucinated by the model; different runs may yield slight variations.
4. Inferred Full Flow of compact()
Based on the extracted data, the author proposes a best‑guess diagram of the server‑side workflow for the compact() call, showing how the compactor LLM receives the hidden system prompt, processes the injected payload, produces a plaintext summary, encrypts it into an AES blob, and later decrypts it during responses.create() while appending a handoff prompt.
5. Open Questions
Why does Codex CLI use entirely different compression paths for Codex versus non‑Codex models when the underlying prompts are almost identical? Why encrypt the summary blob? The author admits these questions lack definitive answers and invites further investigation.
AI Tech Publishing
In the fast-evolving AI era, we thoroughly explain stable technical foundations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
