How Recursive Language Models Enable Unlimited Context for LLMs
Recursive Language Models (RLM) offer a cost‑effective alternative to expanding LLM context windows by storing prompts as variables and enabling recursive calls, allowing models to process over 100,000 tokens, with experiments showing superior performance and lower median costs compared to baseline approaches.
The author introduces Recursive Language Models (RLM) as a clever, economical technique for handling extremely long contexts without simply enlarging the model’s window.
Why need longer context?
LLM context length has surged from 4K to 128K and even 1M tokens, yet real‑world tasks such as code‑base QA (≈900 K tokens), multi‑hop QA (≈8 M tokens), and deep research often exceed these limits. GPT‑5’s performance collapses after about 272 K tokens.
Traditional scaling approaches
Scale Context Length : increase the token window during training and inference, which demands massive compute, data, and engineering effort.
External Architecture : employ vector databases, retrieval systems, or sliding windows to split long texts, risking loss of global information and added system complexity.
RLM: putting the prompt into an external environment
Instead of forcing the entire prompt into the Transformer, RLM treats the prompt as a Python variable context and equips the model with tool‑like capabilities:
Store the whole prompt as a variable.
Provide a “Swiss‑army‑knife”: execute code (regex, slicing, pandas, etc.) and recursively invoke a sub‑model via llm_query().
Allow the model to decide which segment to read, how to split it, and whether to launch a sub‑task.
Experimental snapshot
Four benchmark tasks demonstrate that RLM dramatically outperforms baseline methods. Complexity is measured as the scaling of required information with input length.
Cost curve
Median call cost of RLM is less than or equal to a direct GPT‑5 call for the same input.
Tail cost (95th percentile) can rise due to extreme recursion paths.
Overall, RLM is about three times cheaper than a full‑summary baseline.
Model behavior patterns
Analysis of recursion traces reveals three frequent patterns:
Regex filtering : apply regex to pre‑select keywords, then examine matching fragments.
Chunked recursion : split the input evenly by lines or files and invoke sub‑models on each chunk.
Variable stitching : store sub‑model outputs in variables and concatenate them at the end to form the final answer.
https://arxiv.org/pdf/2512.24601v1
RECURSIVE LANGUAGE MODELS
https://github.com/ysz/recursive-llmHow this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
