19 min read

Claude 4 vs Claude 3.7: Real‑World Coding Benchmarks and Hands‑On Review in Cursor

This article evaluates Anthropic's Claude 4 (especially Claude‑4‑Sonnet) within the Cursor IDE, presenting benchmark scores on SWE‑bench, detailed prompts for UI, frontend, architecture, and backend generation, visual results, and a balanced list of strengths and remaining issues.

Eric Tech Circle

May 24, 2025

Claude 4 vs Claude 3.7: Real‑World Coding Benchmarks and Hands‑On Review in Cursor

Official Introduction

Anthropic released the next‑generation Claude models: Claude Opus 4 and Claude Sonnet 4 , positioning them as new standards for coding, advanced reasoning, and AI agents.

Extended tool use (beta) with web search during reasoning.

Parallel tool execution, more precise instruction following, and enhanced memory when granted local file access.

Claude Code now supports GitHub Actions, native VS Code and JetBrains integration, and in‑file editing.

New API capabilities: code‑execution tool, MCP connector, file API, and up to one‑hour prompt caching.

Claude 4 Model Performance

On the real‑world software‑engineering benchmark SWE‑bench, Claude Opus 4 achieved 72.5% and Claude Sonnet 4 reached 72.7%, surpassing previous models and demonstrating strong performance on complex, long‑running tasks. Terminal‑bench scores were 43.2% for Opus 4.

Data source: https://www.anthropic.com/news/claude-4 [1]

Cursor Pricing

In Cursor, a single Claude 3.7‑Sonnet request consumes 1 token, while the “thinking” model consumes 2 tokens. Claude‑4‑Sonnet costs 0.5 tokens per request and Claude‑4‑Sonnet‑Thinking costs 0.75 tokens, currently offered as a temporary discount.

Practical Test Setup

The author used Cursor 0.50.5 with Claude‑4‑Sonnet. The latest Cursor version can be downloaded from the following repository:

https://github.com/flyeric0212/cursor-history-links [2]

Product UX Prototype Prompt

你是一名精通 UI 设计和产品规划的全栈工程师，你的目标是完成一个"健身普拉提"iOS App 的开发。你的核心任务是输出一套完整的APP原型图（HTML页面形式）来辅助后续的开发任务。核心执行点：- 明确功能与页面：请你构思并确定"健身普拉提"App的核心功能模块。基于这些模块，规划出需要设计的HTML页面清单。- 产品与UI/UX设计：- 以产品经理的视角规划APP的关键功能、页面流程和交互逻辑。- 以设计师的视角输出符合现代iOS App风格的、美观且用户友好的UI/UX。技术规范：- 使用 HTML5、Font Awesome、Tailwind CSS 和必要的 JavaScript（用于基础交互）。- 图片素材请使用 Unsplash。- 代码应简洁，注重可读性。输出要求：- 创建一个包含多个 HTML 页面的原型。- 主页面命名为 index.html，它可以整合或跳转到其他页面。- 非主页面HTML文件使用其对应的核心功能名称进行命名（英文，例如 courses.html, profile.html）。- 每个页面均需采用 iOS App 的风格生成。- index.html 中，每行展示两个主要功能模块的入口或页面预览。- 所有输出（包括代码内注释和页面文本）永远用简体中文。- 请以顶级UX的眼光和审美标准，创造令人满意的设计。请直接开始设计并输出上述要求的HTML原型页面代码，从 index.html 开始，然后是其他你规划的核心功能页面。

Effect images comparing Claude‑4‑Sonnet with Claude‑3.7‑Sonnet are shown below.

Actual Experience

Code generation consistency : Claude 3.7 produced fully end‑to‑end code without manual fixes, while Claude 4 occasionally left parts of HTML unapplied, requiring manual re‑apply.

UI design style : Both models delivered comparable completeness, but Claude 4 offered richer color saturation and visual hierarchy, whereas Claude 3.7 favored a more minimalistic aesthetic.

Frontend Single‑Page Application

Prompt used:

使用 React + Vite 开发一个博客单页应用，包含四个页面：首页、书籍、关于我们和联系我们。通过导航菜单实现页面间的无刷新切换，确保切换过程流畅且用户体验良好。

Effect screenshots are included.

Actual Experience

End‑to‑end code generation : The entire project—from initialization to runnable application—was produced with zero manual intervention, dramatically improving development speed.

Smart feature expansion : With minimal prompt input, the model automatically added rich functionality, making it friendly for non‑technical users and supporting iterative refinement via natural language.

Frontend stack optimization : Compared with Claude 3.7 and 3.5, Claude 4 integrated icon systems, color themes, micro‑interactions, and Unsplash images, showing a more mature engineering mindset.

Architecture Diagram Generation

Two PlantUML prompts were used:

1) C4 Architecture Prompt

请使用PlantUML为一个企业集成平台创建C4模型图。该平台需要集成多个内部系统和外部服务，实现数据同步和业务流程自动化。请创建以下C4视图：系统上下文图、容器图、组件图。系统包括 ERP、CRM、HR、财务、数据仓库、供应商门户、客户门户、支付服务、监管平台。容器包括 API网关、ESB、微服务集群、消息队列、数据处理引擎、规则引擎、监控日志系统、安全认证服务。请在组件层详细展示 ESB 和数据处理引擎的内部结构，并添加描述。

2) DDD Layered Architecture Prompt

请使用PlantUML为一个基于领域驱动设计(DDD)的订单系统创建内部的代码分层架构图，展示完整的分层结构。层次包括：用户界面层、应用层、领域层、基础设施层。描述层之间的调用关系，并在组件图中使用不同颜色标识。请加入一个实体类示例并简要说明关键概念。

Resulting diagrams are shown below.

Actual Experience

Diagram generation efficiency : A complete set of PlantUML code was produced in a single pass, with each file clearly organized for maintainability.

Component relationship modeling : The model expressed component relationships with higher precision, reducing the need for repeated prompt adjustments.

Architecture completeness : Both C4 and DDD diagrams were accurate; Claude 4 offered richer detail in domain modeling.

SpringBoot Backend Project

Prompt for project initialization:

使用 Gradle 在当前项目中，创建一个SpringBoot项目，要求如下：1. 项目根package为 top.flyeric；2. SpringBoot版本3.4.4，JDK 21；3. ORM采用 JPA，数据库使用 H2；4. 核心模型 Book，字段包括 id、title、author、isbn、description、price、publicationDate、publisher、pageCount、inStock、coverImageUrl、genre、language；5. 分层架构采用 MVC。

Prompt for building front‑end static resources:

基于当前项目的 CRUD API 之上实现 Web UI，构建前端代码并将其集成到 Spring Boot 应用的静态资源中。并添加 10 条测试数据。

Resulting UI screenshots are included.

Actual Experience

IDE ecosystem compatibility : Cursor lagged behind professional IDEs (e.g., IntelliJ IDEA) in debugging tools, dependency analysis, and overall Java development experience.

Project initialization strategy : The model generated files directly instead of invoking Gradle wrapper scripts, which can introduce unpredictable bugs.

Plugin support : Lack of Lombok annotation processing forced a fallback to manual getter/setter JavaBeans.

Dependency management adaptation : When faced with version conflicts, the model automatically downgraded SpringBoot to 3.1 and JDK to 17, deviating from the specified stack.

Intelligent error recovery : The model detected and fixed errors via integrated terminal tools, achieving a closed‑loop repair without extra commands.

Front‑end integration optimization : Despite using a traditional static‑resource deployment, the generated UI and interaction design surpassed the quality of IDEA Junie outputs.

Overall Advantages

Exceptional front‑end capabilities: richer UI/UX design and higher visual quality.

Accurate architecture diagram generation with high logical completeness.

Strong automatic error detection and self‑repair, reducing manual intervention.

One‑shot generation of usable code for most scenarios.

Remaining Issues

File application instability: some generated code fails to be correctly written to files.

Backend support gaps: Java ecosystem features (debugging, dependency management) are weaker than dedicated IDEs.

Command adherence: the model may downgrade versions or alter specifications when encountering difficulties.

Conclusion

The author plans to adopt Claude‑4‑Sonnet as the primary coding model, especially for front‑end development and design tasks, expecting future model improvements to resolve the current shortcomings.

prompt engineering AI coding software development Cursor IDE SWE-Bench Claude 4

Written by

Eric Tech Circle

Backend team lead & architect with 10+ years experience, full‑stack engineer, sharing insights and solo development practice.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Official Introduction

Claude 4 Model Performance

Cursor Pricing

Practical Test Setup

Product UX Prototype Prompt

Actual Experience

Frontend Single‑Page Application

Actual Experience

Architecture Diagram Generation

1) C4 Architecture Prompt

2) DDD Layered Architecture Prompt

Actual Experience

SpringBoot Backend Project

Actual Experience

Overall Advantages

Remaining Issues

Conclusion

Eric Tech Circle

How this landed with the community

Was this worth your time?

0 Comments

Claude 4 Model Performance