From ChatBI to Data Agent: Real‑World Lessons on Building Trustworthy AI Data Systems
A live discussion with experts from Xiaohongshu and Guanyuan Data reveals that the real challenge of AI‑driven data agents lies not in model strength but in semantic convergence, knowledge structuring, explainability, cost control, and gaining business trust for production‑grade deployment.
On April 9, DataFunSummit hosted a live dialogue titled “From ChatBI to DataAgent—AI‑driven data architecture and decision‑making practice,” featuring host Du Shujun and two guests: Yang Zhouzhi, technical lead of Xiaohongshu’s data analysis platform, and Yan Lingang, partner and product lead at Guanyuan Data. The conversation was structured around ten core questions covering semantic layers, prompt engineering, data discovery, metric consistency, multi‑turn dialogue, production readiness, explainability, cost management, and roadmap choices.
Key Insight: Trust Over Model Power
The panel agreed that the biggest gap between demo and production is not the model’s raw capability but whether the system can converge on semantics, structure knowledge, provide explanations, and earn business trust. Enterprises need an intelligent system that works reliably in complex data environments rather than a generic “SQL‑writing” large model.
Unified Semantic Layer
When asked whether a unified semantic layer should be built as a Cube, a View, or an API abstraction, the guests emphasized that the layer is a foundational infrastructure for accurate data retrieval. Yang highlighted the difficulty of dimension‑value mapping—aligning business terms with stored values—and described Xiaohongshu’s approach of extracting high‑frequency dimension values into an acceleration engine and using semantic understanding to map natural language back to correct data objects. Yan added that the semantic layer can take various forms (To‑SQL, To‑DSL, intent‑recognition) and that accuracy depends more on correct front‑end understanding than on the final query language.
Prompt Engineering vs. SFT
Yang noted that prompt engineering is valuable early on but quickly reaches diminishing returns; after achieving ~60‑70% test‑set performance, further gains require fine‑tuning (SFT), which pushed accuracy to around 85%. Yan, representing a product‑side perspective, argued that extensive prompt tweaking is costly and that a stable prompt base combined with modular knowledge injections (benchmarking, regression tests, business rules) is more sustainable.
“Finding Data” Remains Hard
Even with AI, locating the right dataset (“finding data”) is still a challenge. Yang’s solution is “circle first, then identify”: limit the search to a curated set of trusted tables per business line before applying semantic matching and labeling. Yan stressed that effective data governance—clear table ownership, annotations, usage metrics—provides the prerequisite for agents to treat assets as reliable knowledge.
Metric Consistency
Discrepancies in metric definitions across systems require a unified metric platform. Agents should defer to officially registered definitions rather than inventing their own, and routing mechanisms must guide queries to the appropriate aggregation paths.
Multi‑Turn Conversation
Maintaining context across turns is less about remembering everything and more about preserving critical constraints. Yang described a two‑tier memory: short‑term context compression within the token window and long‑term summaries of user habits and focus areas. Yan prefers trimming conversation length and rewriting queries to embed only the essential information for the next turn.
Production‑Grade Challenges
Accuracy and trust are the primary obstacles to production deployment. Users expect near‑zero tolerance for errors; a single mistake erodes confidence. Yan highlighted the need for clear acceptance criteria—accuracy thresholds, latency limits, and response‑time expectations—while Yang warned that adding more verification steps can increase latency and hurt user experience.
Explainability
All three guests agreed that explainability is a gateway to business trust. Yang’s team surfaces the reasoning chain and provides data provenance (metrics, dimensions, filters) alongside results. Yan’s approach generates multiple candidate outputs, cross‑validates them, and ensures that final reports are populated by data retrieved via scripted queries rather than model‑generated text, reducing hallucination risk.
Cost Discipline
When large‑model API costs rise, the panel suggested first cutting high‑token collaborative agent scenarios, preserving semantic caching, and considering model distillation or on‑prem deployment. Yan added that high‑value decision‑making sessions may justify higher costs, whereas routine reporting for thousands of stores should rely on rule‑based or templated pipelines.
Continuous Improvement Flywheels
Both speakers described flywheel mechanisms for ongoing enhancement. Yang’s SFT flywheel clusters online questions, selects representative samples, runs AI pre‑screening, conducts human review, fine‑tunes the model, and redeploys. Yan’s three‑layer flywheel incorporates user feedback, prompt/Bad‑Case optimization, and accumulated industry know‑how that becomes reusable knowledge assets for future customers.
Roadmap Choices: Text‑to‑SQL vs. Text‑to‑Semantic‑Query
Yang prefers a DSL or semantic‑agent layer over raw SQL, arguing that limiting the output space improves control in enterprise BI contexts. Yan remains neutral, stating that the choice (SQL, DSL, direct dashboard) matters less than robust intent detection, entity understanding, and contextual grounding.
Final Takeaway
The discussion concluded that enterprises do not need an all‑powerful model that can answer anything; they need a trustworthy, explainable system built on a solid semantic layer, strong data governance, and product‑level safeguards that can transition from impressive demos to reliable production tools.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
