Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Cost

Cursor’s new Composer 2 model outperforms Claude Opus 4.6 on benchmarks like Terminal‑Bench 2.0, slashes pricing to $0.5/2.5 USD per million tokens, and introduces a self‑summary reinforcement‑learning technique that dramatically reduces context loss in long‑running coding tasks.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Cost

Model Overview

Cursor released Composer 2 , a programming‑focused large language model that claims higher capability than Claude Opus 4.6 while offering substantially lower token pricing.

Benchmark Performance

In internal evaluations Composer 2 achieved large gains on all tested suites, notably:

Terminal‑Bench 2.0 – positioned between GPT‑5.4 and Claude Opus 4.6, indicating a clear edge over existing models.

SWE‑bench Multilingual – similarly strong results (exact scores omitted for brevity).

Pricing

Standard Composer 2 rates are:

$0.5 USD per 1 M input tokens (≈ ¥3.5).

$2.5 USD per 1 M output tokens (≈ ¥17.2).

The “Fast” variant trades higher cost for speed:

$1.5 USD per 1 M input tokens (≈ ¥10.3).

$7.5 USD per 1 M output tokens (≈ ¥51.7).

Self‑summary Reinforcement‑Learning Method

Composer 2 is trained with a novel RL objective called “self‑summary”. The model learns to generate its own meeting‑style notes during long tasks, turning summarization into a trained capability rather than a runtime trick.

Composer generates text until a fixed token‑length trigger is reached.

A synthetic query is inserted, asking the model to summarize the current context.

The model receives a drafting space, constructs an optimal summary, and emits a compressed context.

Composer resumes generation using the compressed context, which now contains the summary and state information (planned state, remaining tasks, previous summary count, etc.).

During RL training, reward signals are tied to summary quality:

Good summaries → downstream steps succeed more often → higher reward.

Poor summaries → task failure → penalty.

This drives the model to learn what information to retain and what to discard.

Compression vs. Traditional Summarization

On a set of high‑difficulty software‑engineering tasks, traditional summarization required > 5 000 tokens per compression and still produced long outputs. Composer 2 uses a single prompt (“Please summarize the conversation”) and produces compressed outputs averaging ~1 000 tokens—about one‑fifth the token count—while reducing error rates by roughly 50%.

Long‑chain Task Demonstration

Composer 2 was tested on a classic hard problem: running Doom on a MIPS architecture. The task demanded code modification, compilation, debugging, and iterative refinement. Most models stalled, but Composer 2 completed the task after 170 interaction rounds, compressing > 100 k tokens of intermediate context down to ~1 k tokens.

Future Outlook

Internal tests show that integrating compression into the training loop enables Composer 2 to pass key information efficiently and improve capability on difficult tasks. Cursor has hinted at an upcoming Composer 3 model.

References

https://x.com/mntruell/status/2034729462211002505

https://x.com/RoboIntellect/status/2034693646822580431?s=20

https://x.com/cursor_ai/status/2033967614309835069?s=20

Code example

关注我
们,
设为星标,每天7:40不见不散
回复
架构师
获取资源
大家好,我是你们的朋友架构君,一个会写代码吟诗的架构师。
来源:
量子位(ID:QbitAI)
benchmarkreinforcement learningCursorpricingAI programmingself-summaryComposer 2
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.