Cursor’s Composer 2 Beats Claude Opus 4.6 with ‘Ankle‑Cut’ Pricing via New Reinforcement‑Learning Method

Cursor’s newly released Composer 2 model surpasses Claude Opus 4.6 on benchmarks such as Terminal‑Bench 2.0, offers dramatically lower token pricing, and achieves these gains by introducing a novel self‑summary reinforcement‑learning technique that compresses long‑context tasks while preserving critical information.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Cursor’s Composer 2 Beats Claude Opus 4.6 with ‘Ankle‑Cut’ Pricing via New Reinforcement‑Learning Method

Composer 2 performance and pricing

Composer 2, Cursor’s programming‑focused LLM, outperforms Claude Opus 4.6 on all evaluated benchmarks, including Terminal‑Bench 2.0 and SWE‑bench Multilingual. On Terminal‑Bench 2.0 its score lies between GPT‑5.4 and Claude Opus 4.6, indicating a substantial capability increase.

Pricing (per million tokens):

Standard Composer 2 – input $0.5, output $2.5

Composer 2 Fast – input $1.5, output $7.5 (higher throughput)

Self‑summary reinforcement‑learning method

Cursor introduced a reinforcement‑learning (RL) technique called “self‑summary”. The model learns to generate concise summaries of its own intermediate context, and these summaries are incorporated into the reward signal: good summaries increase the reward, loss of information incurs a penalty.

During inference the model follows a loop:

Generate tokens until a fixed length trigger is reached.

Insert a synthetic query that asks the model to summarize the current context.

Provide a drafting space for the model to construct an optimal summary, then compress it.

Feed the compressed summary together with state information (remaining tasks, previous summaries, planning state) back to step 1.

This mechanism is trained, not a runtime hack; the RL objective explicitly rewards retaining useful information and penalizes forgetting.

Quantitative comparison with traditional summarization

Traditional summarization approaches for long‑context programming tasks require >5 000 tokens for a single task and produce compressed outputs of ~5 000 + tokens. Composer 2 uses a single prompt (e.g., “Please summarize the conversation”) and its compressed output averages ~1 000 tokens—about one‑fifth the token usage. The reduction in token count correlates with an approximate 50 % drop in compression‑induced errors.

Long‑chain task demonstration: Doom on MIPS

Composer 2 was tasked with running the Doom game on a MIPS architecture, a benchmark that forces the model to modify code, compile, and iteratively debug. After 170 interaction rounds the model compressed over 100 k tokens of intermediate data into ~1 k tokens and produced a correct solution.

Future direction

Cursor has announced a forthcoming Composer 3, indicating continued rapid iteration of the self‑summary capability.

Code example

[1]https://x.com/mntruell/status/2034729462211002505
[2]https://x.com/RoboIntellect/status/2034693646822580431?s=20
[3]https://x.com/cursor_ai/status/2033967614309835069?s=20
LLMbenchmarkreinforcement learningCursorpricingtoken compressionComposer 2
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.