Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Price
Cursor’s new Composer 2 programming model outperforms Claude Opus 4.6 on benchmarks like Terminal‑Bench 2.0 and SWE‑bench Multilingual, while slashing token costs to $0.5/M input and $2.5/M output, thanks to a novel self‑summary reinforcement‑learning technique that enables efficient long‑context processing.
Composer 2 Overview
Cursor released Composer 2, a programming‑focused large language model designed for high‑quality code generation at low cost. The standard version is priced at $0.5 / M input tokens and $2.5 / M output tokens; a faster variant (Composer 2 Fast) costs $1.5 / M input and $7.5 / M output tokens.
Benchmark Performance
Composer 2 shows large gains on all evaluated benchmarks, including Terminal‑Bench 2.0 and SWE‑bench Multilingual. On Terminal‑Bench 2.0 its capability lies between GPT‑5.4 and Claude Opus 4.6.
Self‑Summary Reinforcement Learning
Performance‑cost balance is achieved through a new reinforcement‑learning (RL) technique called self‑summary . During training the model is rewarded for producing concise, accurate summaries of its own context and penalized for losing critical information. This makes the ability to “take notes” an intrinsic skill rather than a prompt‑engineering trick.
Workflow
Composer generates tokens until a predefined length trigger.
A synthetic query is inserted asking the model to summarize the current context.
The model is given a drafting space to produce an optimal summary.
The generated summary is merged with the task state, forming a compressed context that is fed back to step 1.
This loop enables continuous context compression and retention across very long interactions.
Token Efficiency
In high‑difficulty software‑engineering tasks traditional summarization prompts can require >5 k tokens, while Composer’s simple prompt Please summarize the conversation yields compressed outputs averaging ~1 k tokens—about one‑fifth the token usage and with ~50 % fewer errors.
Challenging Task Example
Task: run the Doom game on a MIPS architecture using provided source code, a custom frame writer, and a VM script.
Most models stalled, but Composer solved the problem after 170 interaction rounds, compressing >100 k tokens into a 1 k‑token summary and successfully executing Doom on MIPS.
Future Direction
Cursor has hinted at a forthcoming Composer 3, which is expected to extend the self‑summary mechanism and further improve performance.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
