Artificial Intelligence 11 min read

July 2025 AI SQL Benchmark: New Leaders & Deep Dive into Large SQL & DB Migration

The July 2025 SCALE report evaluates the latest AI large models on advanced SQL tasks, introduces new entrants like Claude 3.5 Sonnet and Gemini 2.5 stable releases, upgrades the benchmark with large‑SQL and domestic database conversion metrics, and provides detailed rankings and analyses of model performance across optimization, dialect translation, and understanding.

Aikesheng Open Source Community

Aug 4, 2025

July 2025 AI SQL Benchmark: New Leaders & Deep Dive into Large SQL & DB Migration

1. Monthly Overview and Key Highlights

In July 2025, competition among AI large models for code generation and understanding, especially SQL capabilities, intensified. This SCALE evaluation introduces Claude 3.5 Sonnet, Claude Sonnet 4, and the stable Gemini 2.5 series, and upgrades the benchmark to test complex, real‑world database migration scenarios.

2. Benchmark Updates

We expanded the SQL dialect conversion dataset and added two new metrics: “Large SQL Conversion” (handling >100‑line, complex statements) and “Domestic Database Conversion” (Oracle → OceanBase). The goal is to assess accuracy and logical consistency on ultra‑long scripts, stored procedures, and functions.

New Metric: Large SQL Conversion

Models often lose context or produce syntax errors on very long queries. The benchmark measures their ability to preserve logic across multi‑layered joins, nested queries, and temporary tables.

New Metric: Domestic Database Conversion

With the shift to domestic databases, we evaluate automatic translation from commercial to domestic systems, covering variable declarations, flow control, and exception handling.

3. Rankings and Focus Analysis

SQL Optimization Top 5

SQLFlash – 88.5

DeepSeek‑R1 – 71.6

Claude Sonnet 4 – 70.9

Qwen3‑235B‑A22B – 69.1

GPT‑o4‑mini – 68.4

SQL Dialect Conversion Top 5

GPT‑o4‑mini – 83.3

Qwen3‑235B‑A22B – 81.3

DeepSeek‑R1 – 80.2

Gemini 2.5 Flash – 79.3

Claude Sonnet 4 – 77.1

SQL Understanding Top 5

Gemini 2.5 Flash – 82.3

Gemini 2.5 Pro – 82.0

GPT‑o1 – 81.3

GPT‑o4‑mini – 80.8

DeepSeek‑R1 – 80.5

Deep‑Dive Model Analyses

Claude Sonnet 4 shows balanced performance (SQL optimization 70.9, dialect conversion 77.1, understanding 79.3) but lags in deep optimization and large‑SQL conversion (41.2). Its domestic DB conversion scores 97.4, near‑top.

Gemini 2.5 Pro (stable) improves syntax‑error detection from 89.5 to 100 and raises dialect conversion from 67.1 to 72.2, demonstrating a solid upgrade over the preview version.

Domestic DB conversion case : many models mis‑interpret Oracle’s CAST ({ expr | MULTISET (subquery) } AS type_name ), incorrectly assuming OceanBase lacks MULTISET support, which is actually the opposite.

4. Model Changes This Month

Added models: Claude 3.5 Sonnet (Anthropic, June 2024) and Claude Sonnet 4 thinking (Anthropic, May 2025).

Upgraded versions: Qwen3‑235B‑A22B‑Thinking → Qwen3‑235B‑A22B‑Thinking‑2507, Qwen3‑235B‑A22B‑Instruct → Qwen3‑235B‑A22B‑Instruct‑2507, Gemini 2.5 Pro → stable, Gemini 2.5 Flash → stable.

5. Summary and Outlook

The deeper benchmark dimensions highlight that only a few top models handle large‑SQL conversion well, pointing to key future research directions. Claude Sonnet 4 and the stable Gemini 2.5 series inject fresh competition, while upcoming evaluations will include SQLShift and more complex mixed‑scenario datasets.

6. Expert Commentary

Han Feng, CCIA executive and former Oracle ACE, emphasizes that the SCALE leaderboard establishes a standardized “AI for SQL” evaluation, guiding developers, DBAs, and decision‑makers toward reliable model selection and accelerating AI‑DB integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL AI

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. Monthly Overview and Key Highlights

2. Benchmark Updates

New Metric: Large SQL Conversion

New Metric: Domestic Database Conversion

3. Rankings and Focus Analysis

SQL Optimization Top 5

SQL Dialect Conversion Top 5

SQL Understanding Top 5

Deep‑Dive Model Analyses

4. Model Changes This Month

5. Summary and Outlook

6. Expert Commentary

Aikesheng Open Source Community

How this landed with the community

Was this worth your time?

0 Comments

SQL Optimization Top 5

SQL Dialect Conversion Top 5

SQL Understanding Top 5