Artificial Intelligence 5 min read

Can AI Achieve Human‑Like Long‑Term Memory? Inside Google’s Titans Architecture

Google’s newly unveiled Titans architecture tackles AI’s “forgetfulness” by embedding a Neural Long‑Term Memory (LMM) module that updates model weights during inference using a test‑time training approach and a MIRAS surprise metric, enabling over 2 million‑token context with linear O(N) computation and superior benchmark results versus GPT‑4 RAG.

ShiZhen AI

Dec 5, 2025

Can AI Achieve Human‑Like Long‑Term Memory? Inside Google’s Titans Architecture

1. The Problem: Why Current AI Forgets

Before Titans, AI relied on either Transformers (e.g., GPT‑4) or RNNs (e.g., LSTM, Mamba). Transformers keep precise memory but their compute grows quadratically O(N²) with context length, causing information loss beyond the window. RNNs compress the entire text into a fixed‑size vector, leading to detail loss and newer information overwriting older memories.

2. Core Breakthrough: Storing Memory in Parameters

Titans proposes to move memory from a transient cache into the model’s weights. During inference the new Neural Long‑Term Memory (LMM) module performs “test‑time training”: it applies gradient descent on the model parameters as new facts are encountered, effectively updating its own brain structure.

3. MIRAS: Remember Only What Is Surprising

The MIRAS framework computes a “Surprise Metric”. Repeated, low‑surprise content yields small gradients and is ignored; unexpected, high‑surprise information generates large gradients and is written into memory. This adaptive forgetting lets Titans manage millions of tokens within a limited parameter space.

4. Architecture Details

Inference works like summarisation:

Split the long document into chunks.

The LMM reads each chunk and updates its parameters, forming a memory outline.

When generating a response, LMM produces a “memory vector” that is concatenated to the current context.

The attention mechanism only processes this memory vector together with the recent sentences, keeping computation linear.

5. Empirical Results

In a “needle‑in‑a‑haystack” benchmark, Titans outperformed GPT‑4 with retrieval‑augmented generation (RAG). Key metrics are:

Context length: Titans – 2 million+ Tokens; GPT‑4 (RAG) – 128 k (native)

BABILong accuracy: Titans – significantly higher; GPT‑4 – depends on retrieval quality

Inference complexity: Titans – Linear O(N); GPT‑4 – Quadratic O(N²)

These results demonstrate that AI is moving from a stateless to a stateful paradigm, where memory is not merely cached but ingrained in the model’s parameters.

Transformer AI Architecture long-term memory Test-Time Training Google Titans MIRAS

Written by

ShiZhen AI

Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.