Artificial Intelligence 10 min read

How Titans and MIRAS Enable AI Models to Remember 1 Million Tokens

Google's Titans architecture and the MIRAS theoretical framework introduce a deep neural memory that lets large language models learn in real time, retain surprising information, and handle context windows of up to two million tokens, outperforming existing Transformers and linear RNNs on a range of benchmarks.

PaperAgent

Dec 6, 2025

How Titans and MIRAS Enable AI Models to Remember 1 Million Tokens

Google unveiled the Titans architecture at the NeurIPS 2025 conference, a new model that combines RNN speed with Transformer performance and supports up to 2 million tokens of context via a deep neural memory module.

The design is accompanied by two papers, “Titans” (the concrete architecture) and “MIRAS” (the theoretical blueprint), which together introduce “testing memory” – the ability of a model to retain surprising information without offline retraining.

Titans: Real‑time Learning of New Context

Titans adds a neural long‑term memory module implemented as a two‑layer multilayer perceptron, giving the model far greater expressive capacity than traditional fixed‑size RNN memories. It actively selects which token relationships to store using a “surprise metric” that measures the gradient between current memory and incoming data.

Low surprise : common inputs (e.g., the word “cat” when the model already expects an animal) generate small gradients and are ignored.

High surprise : unexpected inputs (e.g., a banana‑skin image in a financial report) produce large gradients and are written to long‑term memory.

Two mechanisms improve this process: (1) momentum, which combines instantaneous and recent surprise signals, and (2) adaptive weight decay that acts as a forgetting gate to free memory for longer sequences.

MIRAS: A Unified View of Sequence Modeling

MIRAS treats all recent advances—from classic Transformers to fast linear RNNs—as variations of a single associative memory problem: efficiently merging new information with existing knowledge while preventing forgetting.

The framework defines four design dimensions:

Memory architecture : the data structure used to store information (vectors, matrices, or deep MLPs as in Titans).

Attention bias : the internal learning objective that determines what the model prioritizes.

Retention gate : a regularizer that balances new learning against preservation of past knowledge.

Memory algorithm : the optimizer that updates the memory state.

Beyond Mean‑Square‑Error

Most sequence models rely on MSE or dot‑product similarity, which can be sensitive to outliers. MIRAS proposes a generative framework that explores richer loss functions and regularizers, enabling non‑Euclidean objectives.

Three MIRAS‑derived models are presented:

YAAD : uses Huber loss to reduce sensitivity to rare errors.

MONETA : employs a generalized norm penalty for stricter control over what is remembered or forgotten.

MEMORA : enforces a probabilistic‑graph‑like memory update for maximal stability.

Experiments and Results

Titans and the MIRAS variants (YAAD, MONETA, MEMORA) were benchmarked against state‑of‑the‑art architectures such as Transformer++, Mamba‑2, and Gated DeltaNet on DNA modeling, time‑series forecasting, standard language‑modeling datasets (C4, WikiTest), and zero‑shot reasoning tasks (HellaSwag, PIQA). Titans consistently achieved lower perplexity and higher accuracy, especially on the BABELong benchmark where it handled context windows up to 2 million tokens, outperforming even larger models like GPT‑4.

Ablation studies showed that deeper memory modules yield better perplexity and scale more gracefully with longer sequences.

Model card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf
https://research.google/blog/titans-miras-helping-ai-have-long-term-memory
Titans: https://arxiv.org/abs/2501.00663
MIRAS: https://arxiv.org/abs/2504.13173

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models sequence modeling AI memory long-context models MIRAS framework Titans architecture

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.