July 2025 AI SQL Benchmark: New Leaders & Deep Dive into Large SQL & DB Migration
The July 2025 SCALE report evaluates the latest AI large models on advanced SQL tasks, introduces new entrants like Claude 3.5 Sonnet and Gemini 2.5 stable releases, upgrades the benchmark with large‑SQL and domestic database conversion metrics, and provides detailed rankings and analyses of model performance across optimization, dialect translation, and understanding.
1. Monthly Overview and Key Highlights
In July 2025, competition among AI large models for code generation and understanding, especially SQL capabilities, intensified. This SCALE evaluation introduces Claude 3.5 Sonnet, Claude Sonnet 4, and the stable Gemini 2.5 series, and upgrades the benchmark to test complex, real‑world database migration scenarios.
2. Benchmark Updates
We expanded the SQL dialect conversion dataset and added two new metrics: “Large SQL Conversion” (handling >100‑line, complex statements) and “Domestic Database Conversion” (Oracle → OceanBase). The goal is to assess accuracy and logical consistency on ultra‑long scripts, stored procedures, and functions.
New Metric: Large SQL Conversion
Models often lose context or produce syntax errors on very long queries. The benchmark measures their ability to preserve logic across multi‑layered joins, nested queries, and temporary tables.
New Metric: Domestic Database Conversion
With the shift to domestic databases, we evaluate automatic translation from commercial to domestic systems, covering variable declarations, flow control, and exception handling.
3. Rankings and Focus Analysis
SQL Optimization Top 5
SQLFlash – 88.5
DeepSeek‑R1 – 71.6
Claude Sonnet 4 – 70.9
Qwen3‑235B‑A22B – 69.1
GPT‑o4‑mini – 68.4
SQL Dialect Conversion Top 5
GPT‑o4‑mini – 83.3
Qwen3‑235B‑A22B – 81.3
DeepSeek‑R1 – 80.2
Gemini 2.5 Flash – 79.3
Claude Sonnet 4 – 77.1
SQL Understanding Top 5
Gemini 2.5 Flash – 82.3
Gemini 2.5 Pro – 82.0
GPT‑o1 – 81.3
GPT‑o4‑mini – 80.8
DeepSeek‑R1 – 80.5
Deep‑Dive Model Analyses
Claude Sonnet 4 shows balanced performance (SQL optimization 70.9, dialect conversion 77.1, understanding 79.3) but lags in deep optimization and large‑SQL conversion (41.2). Its domestic DB conversion scores 97.4, near‑top.
Gemini 2.5 Pro (stable) improves syntax‑error detection from 89.5 to 100 and raises dialect conversion from 67.1 to 72.2, demonstrating a solid upgrade over the preview version.
Domestic DB conversion case : many models mis‑interpret Oracle’s CAST ({ expr | MULTISET (subquery) } AS type_name ), incorrectly assuming OceanBase lacks MULTISET support, which is actually the opposite.
4. Model Changes This Month
Added models: Claude 3.5 Sonnet (Anthropic, June 2024) and Claude Sonnet 4 thinking (Anthropic, May 2025).
Upgraded versions: Qwen3‑235B‑A22B‑Thinking → Qwen3‑235B‑A22B‑Thinking‑2507, Qwen3‑235B‑A22B‑Instruct → Qwen3‑235B‑A22B‑Instruct‑2507, Gemini 2.5 Pro → stable, Gemini 2.5 Flash → stable.
5. Summary and Outlook
The deeper benchmark dimensions highlight that only a few top models handle large‑SQL conversion well, pointing to key future research directions. Claude Sonnet 4 and the stable Gemini 2.5 series inject fresh competition, while upcoming evaluations will include SQLShift and more complex mixed‑scenario datasets.
6. Expert Commentary
Han Feng, CCIA executive and former Oracle ACE, emphasizes that the SCALE leaderboard establishes a standardized “AI for SQL” evaluation, guiding developers, DBAs, and decision‑makers toward reliable model selection and accelerating AI‑DB integration.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
