Artificial Intelligence 9 min read

How Cursor’s Composer 2 Leverages Self‑Summarization and RL for Long‑Horizon Tasks

The article examines Cursor’s Composer 2 model, detailing its self‑summarization reinforcement‑learning workflow, the limitations of traditional compression methods, token‑efficient results on the CursorBench benchmark, and a challenging Terminal‑Bench case study that demonstrates dramatically reduced token usage while improving performance.

PaperAgent

Mar 21, 2026

How Cursor’s Composer 2 Leverages Self‑Summarization and RL for Long‑Horizon Tasks

Background

Cursor’s Composer 2 model is built on the Kimi K2.5 RL model (model id kimi‑k2p5‑rl‑0317‑s515‑fast ). The model incorporates a self‑summarization mechanism that is trained within a reinforcement‑learning (RL) loop to enable processing of tasks that exceed the model’s native context window.

Self‑Summarization RL Loop

The training loop operates as follows:

Composer generates tokens until a fixed‑length trigger point is reached.

A synthetic query is inserted, asking the model to summarize the current context.

The model is given “thinking time” to produce an optimal, compact summary.

Composer resumes execution using the summarized context (including plan state, remaining tasks, prior summary counts, etc.) and repeats the cycle.

The generated summary is treated as part of the RL reward: high‑quality summaries receive higher reward weight, while poor summaries are down‑weighted, encouraging retention of high‑value information.

Limitations of Conventional Compression

Typical agent frameworks handle context overflow by either prompting the model to summarize text or by sliding the context window and discarding older tokens. Both approaches risk forgetting critical information, which degrades performance on long‑running tasks.

Token‑Efficient Compression Results

Experiments compared Composer’s self‑summarization against a heavily tuned prompt‑based compression baseline on challenging software‑engineering benchmarks (CursorBench Hard). The baseline produced summaries >5,000 tokens, whereas Composer’s self‑summary averaged ~1,000 tokens using a short prompt such as “please summarize the dialogue.”

On two context‑constrained settings (80 k‑token and 40 k‑token trigger points), self‑summarization achieved higher accuracy while using only one‑fifth of the tokens and reusing the KV cache:

80 k‑token: 47.9 % vs. 46.7 % (baseline)

40 k‑token: 47.3 % vs. 44.3 % (baseline)

Self‑summary vs. traditional compression accuracy and token usage

Case Study: Solving a Hard Terminal‑Bench Problem

I have provided /app/doomgeneric/ (Doom source), a special doomgeneric_img.c that writes each frame to /tmp/frame.bmp , and vm.js that looks for doomgeneric_mips and runs it. Please figure out the rest.

This task, named make‑doom‑for‑mips , is extremely challenging; many strong models fail to solve it. An early Composer checkpoint solved the problem after 170 rollouts, generating over 100,000 tokens of self‑summaries that were compressed to ~1,000 tokens, which guided the solution.

Composer self‑summary compression example

Future Directions

Embedding compression directly into the RL training loop gives Composer a clear mechanism for propagating key information, enabling longer and more complex processes such as multi‑agent coordination. Continued improvements in training are expected to expand the capabilities of agentic AI systems.

For the original blog post, see:

https://cursor.com/blog/self-summarization

Cursor Agentic AI Compression long-context Composer 2 Self‑Summarization

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.