Artificial Intelligence 16 min read

Can Transformers Solve Any Computable Problem? RUC Study Shows Context Management Sets the Upper Bound

A recent ICML 2026 position paper clarifies that the computational power of a fixed Transformer model is limited by its context‑management strategy, distinguishing fixed‑system and scaling‑family settings and showing how five concrete management approaches span from constant‑space to full Turing‑completeness.

Machine Heart

Jun 12, 2026

Can Transformers Solve Any Computable Problem? RUC Study Shows Context Management Sets the Upper Bound

Fixed‑system vs. Scaling‑family

The paper separates two fundamentally different assumptions used in prior "Transformer is Turing‑complete" proofs. Fixed‑system refers to a single pretrained Transformer with immutable weights, a fixed context window length, and fixed numeric precision. Scaling‑family denotes a family of models whose context window, precision, or depth grow with input length. The former matches real‑world LLM deployments, while the latter describes an idealised, ever‑expanding model set.

Why Context Management Matters

For a fixed Transformer T, decoding rule D (e.g., greedy decoding), and context manager C, the authors formalise the system as a triple (T, D, C). When T and D are fixed, changing C alone can shift the system’s computational class across three distinct complexity levels.

Five Representative Context‑Management Strategies

1. Summary‑based Management (Constant‑space)

When the context window is about to overflow, the system issues a /compress command that replaces the entire history with a short summary token. This is the strategy used in Gemini CLI, Claude Code, and OpenAI Codex CLI.

Summary‑based context management compresses history into a short abstract.

Result: the system never exceeds constant‑space memory, equivalent to a constant‑space Turing machine; it can recognise at most regular languages and cannot reliably compare long strings, detect palindromes, or perform binary addition.

2. Append‑based Management (Linear‑space)

Tokens generated by the model are appended to the end of the full input sequence, while the context window slides forward like a FIFO buffer. When the window fills, the oldest token is evicted and the next token from the backlog enters the window.

Append‑based management uses a sliding window; new tokens are added at the tail and old tokens are dropped from the head.

Result: computational power matches that of a linear‑space Turing machine, which recognises deterministic context‑sensitive languages.

3. External Storage & Retrieval

Many modern LLM systems attach an unbounded external memory (e.g., a vector database). The context manager can write selected tokens to this storage and later retrieve relevant information back into the context window.

External storage provides an unbounded read‑write memory that the context manager can access.

According to Schuurmans (2023), a self‑regressive Transformer with unrestricted read/write access to such a memory is Turing‑complete, despite a fixed context window.

4. Tool Calling

Agents generate a function‑call token, the system invokes the corresponding external tool, and the tool’s result is inserted back into the context. This pattern appears in ToolFormer, GPT function calling, Claude function calling, and Gemini function calling.

Tool calling outsources computation to an external function and feeds the result back into the model.

Result: because the invoked tool can be arbitrarily powerful (e.g., executing any Python program), the overall system attains full Turing‑completeness.

5. Multi‑token Decoding + Append

Schuurmans et al. (2024) relax the usual one‑token‑per‑step assumption, allowing the model to emit up to K tokens each step, which are then appended as in the append‑based scheme. Tokens may be real symbols or a special empty token ε that is ignored.

Multi‑token decoding generates a tuple of tokens per step; empty tokens are discarded.

Result: when K = 1 the system is equivalent to a linear‑space TM; when K ≥ 2 the system becomes Turing‑complete, showing that the decoding interface itself can be the decisive factor.

Putting the Pieces Together

All five mechanisms can be placed on a single hierarchy: with the same fixed Transformer T, a summary‑based manager yields constant‑space power, an append‑based manager yields linear‑space power, and external storage, tool calling, or multi‑token decoding raise the system to full Turing‑completeness. Thus, the paper’s title is accurate – a Transformer’s Turing‑completeness hinges on its context‑management component.

Practical Recommendations

When claiming "Transformer is Turing‑complete", explicitly state the computational setting and any scaling assumptions.

Future theoretical work should treat the triplet (T, D, C) – a fixed model plus a concrete context manager – as the primary object of analysis.

Beyond qualitative Turing‑completeness, researchers should quantify resource budgets (window size, precision, external memory) and learnability criteria to obtain a more nuanced picture of model capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Transformer large language models Computational theory Turing completeness Context management

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.