Artificial Intelligence 8 min read

Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Price

Cursor’s new Composer 2 programming model outperforms Claude Opus 4.6 on benchmarks like Terminal‑Bench 2.0 and SWE‑bench Multilingual, while slashing token costs to $0.5/M input and $2.5/M output, thanks to a novel self‑summary reinforcement‑learning technique that enables efficient long‑context processing.

DataFunTalk

Mar 22, 2026

Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Price

Composer 2 Overview

Cursor released Composer 2, a programming‑focused large language model designed for high‑quality code generation at low cost. The standard version is priced at $0.5 / M input tokens and $2.5 / M output tokens; a faster variant (Composer 2 Fast) costs $1.5 / M input and $7.5 / M output tokens.

Benchmark Performance

Composer 2 shows large gains on all evaluated benchmarks, including Terminal‑Bench 2.0 and SWE‑bench Multilingual. On Terminal‑Bench 2.0 its capability lies between GPT‑5.4 and Claude Opus 4.6.

Self‑Summary Reinforcement Learning

Performance‑cost balance is achieved through a new reinforcement‑learning (RL) technique called self‑summary . During training the model is rewarded for producing concise, accurate summaries of its own context and penalized for losing critical information. This makes the ability to “take notes” an intrinsic skill rather than a prompt‑engineering trick.

Workflow

Composer generates tokens until a predefined length trigger.

A synthetic query is inserted asking the model to summarize the current context.

The model is given a drafting space to produce an optimal summary.

The generated summary is merged with the task state, forming a compressed context that is fed back to step 1.

This loop enables continuous context compression and retention across very long interactions.

Token Efficiency

In high‑difficulty software‑engineering tasks traditional summarization prompts can require >5 k tokens, while Composer’s simple prompt Please summarize the conversation yields compressed outputs averaging ~1 k tokens—about one‑fifth the token usage and with ~50 % fewer errors.

Challenging Task Example

Task: run the Doom game on a MIPS architecture using provided source code, a custom frame writer, and a VM script.

Most models stalled, but Composer solved the problem after 170 interaction rounds, compressing >100 k tokens into a 1 k‑token summary and successfully executing Doom on MIPS.

Future Direction

Cursor has hinted at a forthcoming Composer 3, which is expected to extend the self‑summary mechanism and further improve performance.

AI Large Language Model reinforcement learning pricing self-summary

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.