How DB‑Surfer’s Agentic NL2SQL Beats the Spider 2.0 Benchmark
Alibaba Cloud’s DB‑Surfer NL2SQL agent combines task‑planning, metadata linking, and a modular architecture to achieve state‑of‑the‑art performance on the Spider 2.0‑Snow benchmark and is now integrated into DataWorks Copilot, dramatically improving enterprise data query efficiency.
Background
In the digital era, enterprise data grows explosively, but many business users cannot query data directly because they do not know SQL. NL2SQL technology, powered by large language models, can translate natural language like “show the product with the highest sales last month” into complex SQL statements.
Challenges
Real‑world databases have complex schemas, diverse SQL dialects, and deeply nested logic, making pure LLM‑based NL2SQL insufficient.
DB‑Surfer Agent
Alibaba Cloud PAI and DataWorks jointly developed the NL2SQL Agent called DB‑Surfer, an end‑to‑end framework designed for large‑scale, high‑complexity database queries. It follows a “total‑divide‑total” collaborative architecture with three stages: query intent preprocessing, code‑agent execution, and multi‑source post‑processing, guided by task planning.
Key Innovations
Achieved 59.78% execution accuracy on the Spider 2.0‑Snow benchmark, ranking first as of August 2025.
Uses joint task planning and metadata linking to give the agent clear execution guidance, improving efficiency and purposefulness.
Modular design and a data‑flywheel knowledge accumulation mechanism enable continuous evolution and easy integration of external tools.
Performance Comparison
DB‑Surfer outperforms leading baselines such as WindAgent (59.05%), ReFoRCE (37.11%), and Spider‑Agent (31.08%), demonstrating a >20‑point advantage in extremely complex database environments.
Product Integration
The agentic NL2SQL technology is deeply integrated into Alibaba Cloud’s DataWorks Copilot, allowing users to generate and execute SQL via natural language. Since launch, Copilot has generated over 32 million lines of code, serving more than 60 000 analysts and developers and improving data‑development efficiency by an average of 35%.
Conclusion
DB‑Surfer’s breakthrough on Spider 2.0 and its deployment in DataWorks Copilot mark a milestone for Alibaba Cloud’s AI‑driven data platform, turning complex SQL queries into conversational interactions and making data insights accessible to business users.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
