How Cursor’s Composer 2 Leverages Self‑Summarization and RL for Long‑Horizon Tasks
The article examines Cursor’s Composer 2 model, detailing its self‑summarization reinforcement‑learning workflow, the limitations of traditional compression methods, token‑efficient results on the CursorBench benchmark, and a challenging Terminal‑Bench case study that demonstrates dramatically reduced token usage while improving performance.
Background
Cursor’s Composer 2 model is built on the Kimi K2.5 RL model (model id kimi‑k2p5‑rl‑0317‑s515‑fast ). The model incorporates a self‑summarization mechanism that is trained within a reinforcement‑learning (RL) loop to enable processing of tasks that exceed the model’s native context window.
Self‑Summarization RL Loop
The training loop operates as follows:
Composer generates tokens until a fixed‑length trigger point is reached.
A synthetic query is inserted, asking the model to summarize the current context.
The model is given “thinking time” to produce an optimal, compact summary.
Composer resumes execution using the summarized context (including plan state, remaining tasks, prior summary counts, etc.) and repeats the cycle.
The generated summary is treated as part of the RL reward: high‑quality summaries receive higher reward weight, while poor summaries are down‑weighted, encouraging retention of high‑value information.
Limitations of Conventional Compression
Typical agent frameworks handle context overflow by either prompting the model to summarize text or by sliding the context window and discarding older tokens. Both approaches risk forgetting critical information, which degrades performance on long‑running tasks.
Token‑Efficient Compression Results
Experiments compared Composer’s self‑summarization against a heavily tuned prompt‑based compression baseline on challenging software‑engineering benchmarks (CursorBench Hard). The baseline produced summaries >5,000 tokens, whereas Composer’s self‑summary averaged ~1,000 tokens using a short prompt such as “please summarize the dialogue.”
On two context‑constrained settings (80 k‑token and 40 k‑token trigger points), self‑summarization achieved higher accuracy while using only one‑fifth of the tokens and reusing the KV cache:
80 k‑token: 47.9 % vs. 46.7 % (baseline)
40 k‑token: 47.3 % vs. 44.3 % (baseline)
Case Study: Solving a Hard Terminal‑Bench Problem
I have provided /app/doomgeneric/ (Doom source), a special doomgeneric_img.c that writes each frame to /tmp/frame.bmp , and vm.js that looks for doomgeneric_mips and runs it. Please figure out the rest.
This task, named make‑doom‑for‑mips , is extremely challenging; many strong models fail to solve it. An early Composer checkpoint solved the problem after 170 rollouts, generating over 100,000 tokens of self‑summaries that were compressed to ~1,000 tokens, which guided the solution.
Future Directions
Embedding compression directly into the RL training loop gives Composer a clear mechanism for propagating key information, enabling longer and more complex processes such as multi‑agent coordination. Continued improvements in training are expected to expand the capabilities of agentic AI systems.
For the original blog post, see:
https://cursor.com/blog/self-summarizationHow this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
