Can Transformers Solve Any Computable Problem? RUC Study Shows Context Management Sets the Upper Bound
A recent ICML 2026 position paper clarifies that the computational power of a fixed Transformer model is limited by its context‑management strategy, distinguishing fixed‑system and scaling‑family settings and showing how five concrete management approaches span from constant‑space to full Turing‑completeness.
Fixed‑system vs. Scaling‑family
The paper separates two fundamentally different assumptions used in prior "Transformer is Turing‑complete" proofs. Fixed‑system refers to a single pretrained Transformer with immutable weights, a fixed context window length, and fixed numeric precision. Scaling‑family denotes a family of models whose context window, precision, or depth grow with input length. The former matches real‑world LLM deployments, while the latter describes an idealised, ever‑expanding model set.
Why Context Management Matters
For a fixed Transformer T, decoding rule D (e.g., greedy decoding), and context manager C, the authors formalise the system as a triple (T, D, C). When T and D are fixed, changing C alone can shift the system’s computational class across three distinct complexity levels.
Five Representative Context‑Management Strategies
1. Summary‑based Management (Constant‑space)
When the context window is about to overflow, the system issues a /compress command that replaces the entire history with a short summary token. This is the strategy used in Gemini CLI, Claude Code, and OpenAI Codex CLI.
Result: the system never exceeds constant‑space memory, equivalent to a constant‑space Turing machine; it can recognise at most regular languages and cannot reliably compare long strings, detect palindromes, or perform binary addition.
2. Append‑based Management (Linear‑space)
Tokens generated by the model are appended to the end of the full input sequence, while the context window slides forward like a FIFO buffer. When the window fills, the oldest token is evicted and the next token from the backlog enters the window.
Result: computational power matches that of a linear‑space Turing machine, which recognises deterministic context‑sensitive languages.
3. External Storage & Retrieval
Many modern LLM systems attach an unbounded external memory (e.g., a vector database). The context manager can write selected tokens to this storage and later retrieve relevant information back into the context window.
According to Schuurmans (2023), a self‑regressive Transformer with unrestricted read/write access to such a memory is Turing‑complete, despite a fixed context window.
4. Tool Calling
Agents generate a function‑call token, the system invokes the corresponding external tool, and the tool’s result is inserted back into the context. This pattern appears in ToolFormer, GPT function calling, Claude function calling, and Gemini function calling.
Result: because the invoked tool can be arbitrarily powerful (e.g., executing any Python program), the overall system attains full Turing‑completeness.
5. Multi‑token Decoding + Append
Schuurmans et al. (2024) relax the usual one‑token‑per‑step assumption, allowing the model to emit up to K tokens each step, which are then appended as in the append‑based scheme. Tokens may be real symbols or a special empty token ε that is ignored.
Result: when K = 1 the system is equivalent to a linear‑space TM; when K ≥ 2 the system becomes Turing‑complete, showing that the decoding interface itself can be the decisive factor.
Putting the Pieces Together
All five mechanisms can be placed on a single hierarchy: with the same fixed Transformer T, a summary‑based manager yields constant‑space power, an append‑based manager yields linear‑space power, and external storage, tool calling, or multi‑token decoding raise the system to full Turing‑completeness. Thus, the paper’s title is accurate – a Transformer’s Turing‑completeness hinges on its context‑management component.
Practical Recommendations
When claiming "Transformer is Turing‑complete", explicitly state the computational setting and any scaling assumptions.
Future theoretical work should treat the triplet (T, D, C) – a fixed model plus a concrete context manager – as the primary object of analysis.
Beyond qualitative Turing‑completeness, researchers should quantify resource budgets (window size, precision, external memory) and learnability criteria to obtain a more nuanced picture of model capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
