July 2025 AI SQL Benchmark: New Leaders & Deep Dive into Large SQL & DB Migration

The July 2025 SCALE report evaluates the latest AI large models on advanced SQL tasks, introduces new entrants like Claude 3.5 Sonnet and Gemini 2.5 stable releases, upgrades the benchmark with large‑SQL and domestic database conversion metrics, and provides detailed rankings and analyses of model performance across optimization, dialect translation, and understanding.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
July 2025 AI SQL Benchmark: New Leaders & Deep Dive into Large SQL & DB Migration

1. Monthly Overview and Key Highlights

In July 2025, competition among AI large models for code generation and understanding, especially SQL capabilities, intensified. This SCALE evaluation introduces Claude 3.5 Sonnet, Claude Sonnet 4, and the stable Gemini 2.5 series, and upgrades the benchmark to test complex, real‑world database migration scenarios.

2. Benchmark Updates

We expanded the SQL dialect conversion dataset and added two new metrics: “Large SQL Conversion” (handling >100‑line, complex statements) and “Domestic Database Conversion” (Oracle → OceanBase). The goal is to assess accuracy and logical consistency on ultra‑long scripts, stored procedures, and functions.

New Metric: Large SQL Conversion

Models often lose context or produce syntax errors on very long queries. The benchmark measures their ability to preserve logic across multi‑layered joins, nested queries, and temporary tables.

New Metric: Domestic Database Conversion

With the shift to domestic databases, we evaluate automatic translation from commercial to domestic systems, covering variable declarations, flow control, and exception handling.

3. Rankings and Focus Analysis

SQL Optimization Top 5

SQLFlash – 88.5

DeepSeek‑R1 – 71.6

Claude Sonnet 4 – 70.9

Qwen3‑235B‑A22B – 69.1

GPT‑o4‑mini – 68.4

SQL Dialect Conversion Top 5

GPT‑o4‑mini – 83.3

Qwen3‑235B‑A22B – 81.3

DeepSeek‑R1 – 80.2

Gemini 2.5 Flash – 79.3

Claude Sonnet 4 – 77.1

SQL Understanding Top 5

Gemini 2.5 Flash – 82.3

Gemini 2.5 Pro – 82.0

GPT‑o1 – 81.3

GPT‑o4‑mini – 80.8

DeepSeek‑R1 – 80.5

Deep‑Dive Model Analyses

Claude Sonnet 4 shows balanced performance (SQL optimization 70.9, dialect conversion 77.1, understanding 79.3) but lags in deep optimization and large‑SQL conversion (41.2). Its domestic DB conversion scores 97.4, near‑top.

Gemini 2.5 Pro (stable) improves syntax‑error detection from 89.5 to 100 and raises dialect conversion from 67.1 to 72.2, demonstrating a solid upgrade over the preview version.

Domestic DB conversion case : many models mis‑interpret Oracle’s CAST ({ expr | MULTISET (subquery) } AS type_name ), incorrectly assuming OceanBase lacks MULTISET support, which is actually the opposite.

4. Model Changes This Month

Added models: Claude 3.5 Sonnet (Anthropic, June 2024) and Claude Sonnet 4 thinking (Anthropic, May 2025).

Upgraded versions: Qwen3‑235B‑A22B‑Thinking → Qwen3‑235B‑A22B‑Thinking‑2507, Qwen3‑235B‑A22B‑Instruct → Qwen3‑235B‑A22B‑Instruct‑2507, Gemini 2.5 Pro → stable, Gemini 2.5 Flash → stable.

5. Summary and Outlook

The deeper benchmark dimensions highlight that only a few top models handle large‑SQL conversion well, pointing to key future research directions. Claude Sonnet 4 and the stable Gemini 2.5 series inject fresh competition, while upcoming evaluations will include SQLShift and more complex mixed‑scenario datasets.

6. Expert Commentary

Han Feng, CCIA executive and former Oracle ACE, emphasizes that the SCALE leaderboard establishes a standardized “AI for SQL” evaluation, guiding developers, DBAs, and decision‑makers toward reliable model selection and accelerating AI‑DB integration.

SQLAI
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.