Industry Insights 18 min read

From ChatBI to DataAgent: How Enterprise AI Moves from Demo to Trusted Production

A live discussion with data platform leaders reveals that the real challenge of AI‑driven data agents lies not in model strength but in building a stable, explainable semantic layer, managing prompt versus fine‑tuning trade‑offs, ensuring trustworthy multi‑turn conversations, and aligning cost with business value for production deployment.

DataFunTalk

Apr 15, 2026

From ChatBI to DataAgent: How Enterprise AI Moves from Demo to Trusted Production

Background and Participants

The DataFunSummit livestream on April 9 featured host Du Shujun and two guests: Yang Zhouzhi, technical lead of Xiaohongshu’s data analysis platform, and Yan Lingang, partner and product lead at Guanyuan Data. The conversation focused on practical engineering realities of moving a Data Agent from prototype to production.

1. The Semantic Layer Is the Foundation

Rather than debating whether to use a Cube, a View, or an API abstraction, the panel emphasized that a unified semantic layer is essential for accurate data retrieval. Yang highlighted the difficulty of mapping user‑facing terms (e.g., "industry", "advertiser") to stored values, so Xiaohongshu invests heavily in dimension‑value mapping, pre‑extracting high‑frequency values into an acceleration engine and using semantic understanding to translate natural language queries into correct data objects.

Yan added that the semantic layer can take different forms—To‑SQL, To‑DSL, or intent‑recognition pipelines—depending on the scenario, but the key is correct understanding before generation.

2. Prompt Engineering vs. Super‑Fine‑Tuning (SFT)

Yang explained that prompt engineering works well early on, boosting test‑set performance to 60‑70%, but quickly hits diminishing returns as context windows become saturated. To break the ceiling, Xiaohongshu introduced SFT at later stages, raising accuracy to around 85%.

Yan argued that heavy model fine‑tuning is often uneconomical for product vendors because customers switch models frequently. Instead, Guanyuan Data stabilizes the system with robust prompt libraries, benchmark‑driven regression testing, and modular knowledge injection (e.g., error cases, SQL constraints, analysis methods).

3. Finding the Right Table (Data Governance)

Both speakers agreed that “finding data” remains a core pain point. Yang’s approach is to first narrow the scope to a curated, high‑trust dataset per business line, then perform intent parsing, candidate matching, and semantic tagging within that bounded space.

Yan stressed that effective table selection depends on prior data governance: clear field annotations, usage metrics, and thematic data spaces enable the Agent to treat assets as trustworthy knowledge.

4. Consistency Across Definitions

When multiple definitions of “active enterprise user” exist, the Agent must defer to a centrally managed metric catalog rather than invent its own definition. Both guests highlighted the need for a unified metric platform that records official definitions, associated datasets, and routing rules.

5. Multi‑Turn Conversation Memory

Yang described a two‑tier memory system: short‑term memory relies on the current context window with compression to avoid token explosion, while long‑term memory stores summarized user habits and interests for retrieval in subsequent sessions.

Yan prefers a more conservative approach, trimming conversation history to the most relevant turns and rewriting the current query to embed essential context, avoiding full context replay.

6. Trust and Explainability in Production

Accuracy is the primary barrier to production. Users tolerate almost no errors; a single mistake erodes trust. Beyond raw numbers, users expect the Agent to provide reasoning, data provenance, and the ability to inspect underlying dimensions, measures, and filters.

Both guests described engineering practices to support explainability: generating multiple candidate results for cross‑validation, driving scripts that fetch raw data rather than letting the model hallucinate, and exposing the reasoning chain in the final report.

7. Cost Management When Model Prices Rise

If large‑model API costs increase tenfold, Yang would first cut high‑token collaborative Agent workflows and explore distillation or on‑prem deployment for SFT models, while preserving semantic caching to reduce latency and cost.

Yan would prioritize high‑value decision‑making sessions for expensive inference and replace routine reporting with rule‑based templates, thereby minimizing reliance on costly generation.

8. Continuous Improvement Flywheels

Yang outlined an SFT flywheel: collect online questions, cluster representative samples, feed them into a training pool, perform AI‑first screening followed by human review, fine‑tune, test, and redeploy.

Yan described a three‑layer flywheel: user feedback → bad‑case repository and prompt optimization → industry‑wide knowledge assets that feed back into new customers, turning experience into reusable knowledge.

9. DSL vs. SQL Debate

Yang favors a DSL or semantic‑query layer because unrestricted SQL can lead to model drift; a constrained DSL offers better control for enterprise BI.

Yan remains neutral, stating that the choice (SQL, DSL, or direct dashboard) is less important than robust intent detection, entity extraction, and contextual grounding.

10. Vision of the Data Agent

The panel concluded that the decisive shift from ChatBI to a trustworthy Data Agent is not a stronger model but a system that earns business confidence through solid semantic foundations, governance, explainability, and cost‑effective operations.

Cost management Semantic Layer Explainability Enterprise AI Data Agent Super Fine Tuning

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.