Artificial Intelligence 17 min read

Why Data Agents Are the Next AI Frontier: Insights from Volcano Engine’s Journey

In this talk, Volcano Engine’s technical expert Chen Shuo explains the evolution of the Data Agent platform, the four‑quadrant framework for AI‑driven analytics, real‑world deployment challenges, architectural upgrades from pipeline to intelligent scheduling, and key lessons for building reliable, enterprise‑grade AI agents.

ByteDance Data Platform

Sep 24, 2025

Why Data Agents Are the Next AI Frontier: Insights from Volcano Engine’s Journey

Speaker: Volcano Engine technology expert Chen Shuo, presenting at the AICon Global AI & Machine Learning Conference.

This talk is organized into five parts: an overview of Data Agent, the evolution of the intelligent analysis Agent product, the technical architecture evolution, recent deployment progress, and future architectural outlook.

Welcome everyone, I’m honored to share our practical experiences and pitfalls with Volcano Engine’s Data Agent – an intelligent analysis Agent direction.

We start with a "four‑quadrant" framework to discuss different technical paths:

Quadrant 1: Pure large models, directly calling APIs to generate text.

Quadrant 2: General agents (e.g., Deep Research) that can write reports and conduct research.

Quadrant 3: (Not mentioned explicitly) – typically hybrid solutions.

Quadrant 4: Traditional data products such as BI tools and attribution analysis systems.

General agents often struggle with data‑analysis tasks. For example, generating correct SQL code can be as random as a lottery draw – only two or three successes out of ten attempts.

The key difficulty is integrating enterprise knowledge. A company’s metric platform is a complex system; generic agents cannot easily understand or connect to this specialized data knowledge base.

Why Data Agent matters: It must seamlessly connect to an enterprise knowledge base while refining processes and toolchains to improve business applicability and data accuracy.

In short, the first generation of data analysis agents can be seen as “Chat BI” – conversational business‑intelligence interaction. The second generation moves closer to a general agent, capable of end‑to‑end automated analysis and producing Markdown or web reports.

Volcano Engine has built a complete product suite to support these capabilities, including Chat BI data insight reports, open data‑analysis Agent APIs, and automatic dashboard generation.

The product capabilities are layered:

Bottom layer: adapts to various model bases (internal Volcano Engine models or external OpenAI‑compatible models).

Middle layer: a data‑capability foundation that handles core data connections, permission control, and other foundational issues.

Upper layer: a configuration‑management layer that semantically processes scattered data naming and descriptions, integrating business knowledge graphs so models truly understand enterprise data.

Top layer: user‑facing data consumption products, such as multi‑turn Chat BI interfaces and the newly launched deep‑research mode. These capabilities are accessible via native UI or open APIs that can be embedded into enterprise OA systems or workflow platforms. Product evolution highlights the concept of “Product Model Fit” – the product form must match model capabilities. Before large‑model breakthroughs, BI products added attribution and prediction features but were too complex for ordinary users. With ChatGPT 3.5 (late 2023), a wave of Chat BI products emerged, yet their scenarios remained limited. 2024 marked the “Agent Year” because model capabilities finally supported open‑ended Agent designs. Our first‑generation product “Smart Question‑Answer” was born during the ChatGPT 3.5 era, focusing on analysts’ workflow: flexible querying, insight discovery, and then solidifying conclusions into reports. Chat BI enables users to ask questions, automatically performs attribution and drill‑down, and can generate daily or weekly reports. However, Chat BI cannot fully replace professional analysts for complex tasks such as simultaneous same‑period, ring‑ratio, and attribution analysis. For front‑line staff (e.g., 8,000 Douyin field‑sales members), traditional BI tools cannot support mobile, real‑time queries. Chat BI shines in these mobile, ad‑hoc scenarios. We identified three substitution logics:

Product substitution depends on target users.

Scenario substitution depends on task complexity.

Skill substitution depends on user roles (decision‑makers and front‑line staff benefit most).

Ultimately, Chat BI is not a universal key; finding its product‑market fit is crucial. In 2025 we launched the “Deep Analysis Mode”, closer to a general Agent: users pose open questions, the system automatically creates an analysis plan, decomposes sub‑tasks, executes them, and outputs a Markdown or web report. We built a structured knowledge base to resolve domain‑specific terminology (e.g., advertising “consumption”). Accurate data is a hard requirement: a 99% accuracy per data point drops to ~82% after twenty points, making verification essential. To improve accuracy we introduced a clarification‑by‑question mechanism and automated validation, effectively giving the Agent a “quality‑control inspector”. Technical architecture evolution: Version 1.0 followed a pipeline: schema linking → semantic ranking → dataset selection → knowledge‑base + prompt → code generation → execution → visualization. After model upgrades, this rigid pipeline proved too inflexible. Version 2.0 broke the fixed modules into toolkits (dataset selector, chart insight tool, SQL/Python sandbox, etc.). The model dynamically plans execution like building with LEGO blocks, resembling a React‑style self‑optimizing Agent. Version 3.0 introduced a “One Agent” approach, separating data‑exploration Agent from insight‑generation Agent, adding a context engine for memory management, and redesigning the Agent Workspace as a more natural “workbench” for the model. Deployment results: in e‑commerce, front‑line operators use Chat BI for ad‑hoc queries and attribution, turning frequent questions into automated reports; in intelligent investment advisory, Agent‑generated marketing reports boost consultant efficiency. Key takeaways:

Errors compound exponentially; a 99% step accuracy can drop to ~82% after twenty steps, so architectures must include redundancy, multi‑stage verification, and validation.

Teams must run parallel experiments; rapid iteration beats chasing a single perfect solution.

Thank you for listening! Data+AI Recommended Reading: Click to read the original article and apply for a Data Agent trial

machine learning AI Data Analysis product development Agent architecture Data Agent

Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.