Can Databases Teach Themselves? Exploring Agents‑Based Self‑Explaining Text‑to‑SQL

This article introduces the Agents‑Companion paradigm for Text‑to‑SQL, detailing how self‑describing database agents autonomously mine schema, statistics and semantics to generate high‑quality evidence, thereby bridging the gap between academic research and industrial deployment and significantly improving query accuracy.

Amap Tech
Amap Tech
Amap Tech
Can Databases Teach Themselves? Exploring Agents‑Based Self‑Explaining Text‑to‑SQL

Background

Relational databases contain large amounts of business data, but the steep learning curve of SQL limits access for non‑technical users. Text‑to‑SQL (NL2SQL) aims to translate natural‑language questions into executable SQL, yet most benchmarks (e.g., BIRD, Spider) assume fully annotated schemas and manually provided external evidence.

Problem

In real deployments databases often lack comments, have ambiguous column names, undefined value ranges, and missing enumerations. When external evidence is removed from the BIRD benchmark, state‑of‑the‑art models lose more than 15 % accuracy, with severe degradation on the hardest queries.

Agents‑Companion Paradigm

Agents‑Companion treats each research object (table, API, model) as an autonomous agent that summarizes its own knowledge (identity, location, capabilities, tags) and exposes this information to a central LLM. This creates a closed‑loop of offline profiling, online routing, and on‑demand evidence generation.

Core Mechanisms

Offline Schema Mining : Build a digital portrait of each table/column (DDL, foreign keys, sample values, distribution statistics such as mean, quantiles, frequent enums).

Online Query Routing : Analyze the natural‑language question to decide which evidence types are needed (numerical reasoning, domain knowledge, synonym mapping, enumeration explanation) and select the appropriate agents.

On‑Demand Evidence Generation : Produce structured natural‑language evidence (JOIN path descriptions, semantic constraints, logical completion hints) and inject it into LLM prompts.

Schema Mining Agent

Extracts DDL, foreign‑key relations, and a sample of column values.

Uses an LLM to summarize “column name + context + sample values” into field descriptions, candidate synonyms, unit hints, and value glossaries.

Creates a few‑shot QA knowledge base by abstracting historical (question, SQL) pairs into templates and SQL skeletons for cross‑database semantic priors.

Query Routing Agent

Detects required evidence types from the user query.

Applies schema‑knowledge for semantic expansion (synonym replacement, coreference resolution) to improve column alignment.

For enumerated fields, automatically extracts the value domain to avoid mismatched WHERE clauses.

Evidence Generation Agent

Structural consistency evidence : Generates natural‑language descriptions of join paths, e.g., “customer joins order via customer_id”.

Semantic constraint evidence : Converts intents such as “after 2020” into executable conditions like year > 2020.

Logical completion evidence : Retrieves similar cases to suggest complex reasoning patterns (e.g., using RANK() window function for top‑10 % filtering).

Experimental Validation

Evaluated on the BIRD dataset (95 real databases across 37 domains) with all external evidence removed. Adding the Agents‑Companion components restored performance for three leading SOTA methods:

Schema Mining Agent alone yields large gains on simple and medium‑difficulty queries.

The full three‑stage system provides the greatest boost on challenging queries, confirming that the loop of knowledge deposition, semantic activation, and evidence construction is the primary source of improvement.

Conclusion

Agents‑Companion demonstrates an industrial‑grade Text‑to‑SQL solution that eliminates reliance on manually curated evidence by mining rich, first‑hand knowledge from the database itself. The approach offers a low‑cost, robust, and explainable path for deploying conversational data analysis in real‑world settings.

AILLMText-to-SQLagentsDatabase Mining
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.