Artificial Intelligence 5 min read

How Recursive Language Models Enable Unlimited Context for LLMs

Recursive Language Models (RLM) offer a cost‑effective alternative to expanding LLM context windows by storing prompts as variables and enabling recursive calls, allowing models to process over 100,000 tokens, with experiments showing superior performance and lower median costs compared to baseline approaches.

PaperAgent

Jan 6, 2026

How Recursive Language Models Enable Unlimited Context for LLMs

The author introduces Recursive Language Models (RLM) as a clever, economical technique for handling extremely long contexts without simply enlarging the model’s window.

Why need longer context?

LLM context length has surged from 4K to 128K and even 1M tokens, yet real‑world tasks such as code‑base QA (≈900 K tokens), multi‑hop QA (≈8 M tokens), and deep research often exceed these limits. GPT‑5’s performance collapses after about 272 K tokens.

Traditional scaling approaches

Scale Context Length : increase the token window during training and inference, which demands massive compute, data, and engineering effort.

External Architecture : employ vector databases, retrieval systems, or sliding windows to split long texts, risking loss of global information and added system complexity.

RLM: putting the prompt into an external environment

Instead of forcing the entire prompt into the Transformer, RLM treats the prompt as a Python variable context and equips the model with tool‑like capabilities:

Store the whole prompt as a variable.

Provide a “Swiss‑army‑knife”: execute code (regex, slicing, pandas, etc.) and recursively invoke a sub‑model via llm_query().

Allow the model to decide which segment to read, how to split it, and whether to launch a sub‑task.

Experimental snapshot

Four benchmark tasks demonstrate that RLM dramatically outperforms baseline methods. Complexity is measured as the scaling of required information with input length.

Cost curve

Median call cost of RLM is less than or equal to a direct GPT‑5 call for the same input.

Tail cost (95th percentile) can rise due to extreme recursion paths.

Overall, RLM is about three times cheaper than a full‑summary baseline.

Model behavior patterns

Analysis of recursion traces reveals three frequent patterns:

Regex filtering : apply regex to pre‑select keywords, then examine matching fragments.

Chunked recursion : split the input evenly by lines or files and invoke sub‑models on each chunk.

Variable stitching : store sub‑model outputs in variables and concatenate them at the end to form the final answer.

https://arxiv.org/pdf/2512.24601v1
RECURSIVE LANGUAGE MODELS
https://github.com/ysz/recursive-llm

prompt engineering Long-context AI research cost efficiency LLM scaling Recursive Language Models

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.