How DB‑Surfer’s Agentic NL2SQL Beats the Spider 2.0 Benchmark

Alibaba Cloud’s DB‑Surfer NL2SQL agent combines task‑planning, metadata linking, and a modular architecture to achieve state‑of‑the‑art performance on the Spider 2.0‑Snow benchmark and is now integrated into DataWorks Copilot, dramatically improving enterprise data query efficiency.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How DB‑Surfer’s Agentic NL2SQL Beats the Spider 2.0 Benchmark

Background

In the digital era, enterprise data grows explosively, but many business users cannot query data directly because they do not know SQL. NL2SQL technology, powered by large language models, can translate natural language like “show the product with the highest sales last month” into complex SQL statements.

Challenges

Real‑world databases have complex schemas, diverse SQL dialects, and deeply nested logic, making pure LLM‑based NL2SQL insufficient.

DB‑Surfer Agent

Alibaba Cloud PAI and DataWorks jointly developed the NL2SQL Agent called DB‑Surfer, an end‑to‑end framework designed for large‑scale, high‑complexity database queries. It follows a “total‑divide‑total” collaborative architecture with three stages: query intent preprocessing, code‑agent execution, and multi‑source post‑processing, guided by task planning.

Key Innovations

Achieved 59.78% execution accuracy on the Spider 2.0‑Snow benchmark, ranking first as of August 2025.

Uses joint task planning and metadata linking to give the agent clear execution guidance, improving efficiency and purposefulness.

Modular design and a data‑flywheel knowledge accumulation mechanism enable continuous evolution and easy integration of external tools.

Performance Comparison

DB‑Surfer outperforms leading baselines such as WindAgent (59.05%), ReFoRCE (37.11%), and Spider‑Agent (31.08%), demonstrating a >20‑point advantage in extremely complex database environments.

Product Integration

The agentic NL2SQL technology is deeply integrated into Alibaba Cloud’s DataWorks Copilot, allowing users to generate and execute SQL via natural language. Since launch, Copilot has generated over 32 million lines of code, serving more than 60 000 analysts and developers and improving data‑development efficiency by an average of 35%.

Conclusion

DB‑Surfer’s breakthrough on Spider 2.0 and its deployment in DataWorks Copilot mark a milestone for Alibaba Cloud’s AI‑driven data platform, turning complex SQL queries into conversational interactions and making data insights accessible to business users.

DB‑Surfer performance chart
DB‑Surfer performance chart
DataWorks Copilot demo
DataWorks Copilot demo
Additional illustration
Additional illustration
databaseText-to-SQLDataWorksAgentic AINL2SQL
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.