Artificial Intelligence 10 min read

ChatBI: Leveraging Large Language Models for Intelligent Business Intelligence at Ximalaya

This article details Ximalaya’s ChatBI project, describing how large language models are integrated into a BI platform to improve data accessibility, reduce development effort, and enhance query accuracy through prompt engineering, RAG, fine‑tuning, and multi‑agent architectures.

DataFunTalk

Oct 11, 2024

ChatBI: Leveraging Large Language Models for Intelligent Business Intelligence at Ximalaya

Background Ximalaya faces challenges in data analysis: high barriers for business users, slow response, inflexible dashboards, and inefficient self‑service data extraction, while data engineers struggle with limited resources, high development costs, and under‑utilized high‑quality data in the warehouse.

Goal To build a large‑model‑driven BI application that reduces development pressure and provides an easy‑to‑use interface for business users, thereby unlocking data value.

Product Architecture ChatBI offers three product forms – a web portal, a DingTalk chatbot, and an open API. The architecture consists of two layers: the ChatBI layer (frontend interfaces) and the Data Intelligence Engine layer (backend agents). The engine includes multiple agents for intent recognition, metric definition, data query, SQL generation, data development, and governance.

Five‑layer Architecture

Model Integration Layer: integrates embedding, commercial text, self‑trained text, and audio models from the company’s LLM platform.

Dataset & Knowledge Management Layer: stores table schemas, business vocabularies, rules, and SQL dialects to provide rich context for prompts.

Tool Capability Layer: provides retrieval augmentation, session memory, DB query, syntax checking, permission verification, and an automated testing and logging system.

Agent Capability Layer: includes intent recognition, smart rewriting, dataset selection, NL‑to‑SQL, data visualization, metric queries, analysis summarization, and automatic SQL correction.

Product Capability Layer: delivers smart table selection, metric queries, multi‑turn dialogue, joint table queries, intelligent charting, and analysis summarization.

Product Forms: DingTalk bot, web UI, and open API.

Implementation Details

Understanding how humans write SQL informs the design: identify tables and fields, interpret business terminology, handle time dimensions, and adapt to different SQL dialects (e.g., MySQL vs. StarRocks). ChatBI parses user questions, rewrites them, retrieves relevant knowledge, generates NL‑to‑SQL, validates and corrects SQL, executes queries, and selects appropriate visualizations.

Model Optimization Techniques include Prompt Engineering, Retrieval‑Augmented Generation (RAG), Fine‑Tuning, combined RAG + Fine‑Tuning, multi‑agent orchestration, and continuous model upgrades. Optimizations span knowledge (high‑quality tables, rules, examples), technical (prompt splitting, multi‑agent routing, vector/graph retrieval, model upgrades), product (enhanced interaction, multi‑turn context, explainability), and quality assurance (extensive unit tests, feedback loops, traceability).

Results After two weeks of launch, unique visitors surpassed the previous self‑service tool, page views reached half of the baseline, query latency improved severalfold, and answer accuracy stabilized around 85%.

Future Outlook Plans include strengthening intent recognition, smart rewriting, error correction, and chart generation, as well as exploring DataOps agents for SQL generation, optimization, and troubleshooting, ultimately giving all data products natural‑language interaction capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

.ai Prompt Engineering Business Intelligence Data Platform large language model

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.