Can Databases Teach Themselves? Exploring Agents‑Based Self‑Explaining Text‑to‑SQL
This article introduces the Agents‑Companion paradigm for Text‑to‑SQL, detailing how self‑describing database agents autonomously mine schema, statistics and semantics to generate high‑quality evidence, thereby bridging the gap between academic research and industrial deployment and significantly improving query accuracy.
Background
Relational databases contain large amounts of business data, but the steep learning curve of SQL limits access for non‑technical users. Text‑to‑SQL (NL2SQL) aims to translate natural‑language questions into executable SQL, yet most benchmarks (e.g., BIRD, Spider) assume fully annotated schemas and manually provided external evidence.
Problem
In real deployments databases often lack comments, have ambiguous column names, undefined value ranges, and missing enumerations. When external evidence is removed from the BIRD benchmark, state‑of‑the‑art models lose more than 15 % accuracy, with severe degradation on the hardest queries.
Agents‑Companion Paradigm
Agents‑Companion treats each research object (table, API, model) as an autonomous agent that summarizes its own knowledge (identity, location, capabilities, tags) and exposes this information to a central LLM. This creates a closed‑loop of offline profiling, online routing, and on‑demand evidence generation.
Core Mechanisms
Offline Schema Mining : Build a digital portrait of each table/column (DDL, foreign keys, sample values, distribution statistics such as mean, quantiles, frequent enums).
Online Query Routing : Analyze the natural‑language question to decide which evidence types are needed (numerical reasoning, domain knowledge, synonym mapping, enumeration explanation) and select the appropriate agents.
On‑Demand Evidence Generation : Produce structured natural‑language evidence (JOIN path descriptions, semantic constraints, logical completion hints) and inject it into LLM prompts.
Schema Mining Agent
Extracts DDL, foreign‑key relations, and a sample of column values.
Uses an LLM to summarize “column name + context + sample values” into field descriptions, candidate synonyms, unit hints, and value glossaries.
Creates a few‑shot QA knowledge base by abstracting historical (question, SQL) pairs into templates and SQL skeletons for cross‑database semantic priors.
Query Routing Agent
Detects required evidence types from the user query.
Applies schema‑knowledge for semantic expansion (synonym replacement, coreference resolution) to improve column alignment.
For enumerated fields, automatically extracts the value domain to avoid mismatched WHERE clauses.
Evidence Generation Agent
Structural consistency evidence : Generates natural‑language descriptions of join paths, e.g., “customer joins order via customer_id”.
Semantic constraint evidence : Converts intents such as “after 2020” into executable conditions like year > 2020.
Logical completion evidence : Retrieves similar cases to suggest complex reasoning patterns (e.g., using RANK() window function for top‑10 % filtering).
Experimental Validation
Evaluated on the BIRD dataset (95 real databases across 37 domains) with all external evidence removed. Adding the Agents‑Companion components restored performance for three leading SOTA methods:
Schema Mining Agent alone yields large gains on simple and medium‑difficulty queries.
The full three‑stage system provides the greatest boost on challenging queries, confirming that the loop of knowledge deposition, semantic activation, and evidence construction is the primary source of improvement.
Conclusion
Agents‑Companion demonstrates an industrial‑grade Text‑to‑SQL solution that eliminates reliance on manually curated evidence by mining rich, first‑hand knowledge from the database itself. The approach offers a low‑cost, robust, and explainable path for deploying conversational data analysis in real‑world settings.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
