How an AI‑Powered Experiment Analysis Agent Transforms Data Insights

This document outlines the background, design, architecture, workflow, and large‑model integration of an AI‑driven Experiment Analysis Agent, detailing how it consolidates data, automates analysis via modular pipelines, leverages DeepSeek models, and enhances user experience through unified front‑end forms and intelligent messaging.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
How an AI‑Powered Experiment Analysis Agent Transforms Data Insights

Background

Traditional algorithm experiments focus on core metrics such as UCTR and UCVR, but solely improving positive indicators is insufficient; a deeper data analysis is required to identify hidden risk factors and evaluate whether gains come at an unacceptable cost to other key metrics. Experiments often produce diverse or contradictory results, indicating underlying causal relationships that must be understood to optimize evaluation and iteration.

Existing analysis tools are fragmented across platforms, lacking a guided workflow, which hampers the propagation of scientific analysis methodology. This motivated the development of an Experiment Analysis Agent.

Product Showcase

Report Example

Report example
Report example

*The report is anonymized for data and business security.

Product Design

3.1 Architecture Design

The analysis approach starts from a macro view, drilling down to finer granularity after detecting anomalies, mirroring the typical analyst workflow. Inspiration was drawn from the AI Agent "Manus," which performs task decomposition, structured task list generation, online information retrieval, and summarization, providing a full‑process closed‑loop that informed the design of a "summarize data – sub‑analysis – present summary" framework.

To keep costs low, existing departmental reporting tools are reused for core data engineering, while new capabilities such as FDR analysis are developed independently and later stitched together. The overall architecture is modular and layered:

Application layer integrates analysis planning, theme analysis (orchestrated by a DAG workflow execution framework), and conclusion aggregation, supported by DeepSeek R1/V3 models.

Service layer provides authentication, front‑end form services, and other common capabilities using internal solutions (JDSDK + CHO‑JSF).

Data layer leverages distributed engines (Doris‑X, ClickHouse, Spark) to build both real‑time (OLAP) and offline (BDP) gateways, reusing existing metric systems while adding independent FDR modules.

Agent architecture
Agent architecture

3.2 Product Design

The Agent is intended to be a true "assistant"; a unified front‑end form was designed so that product‑research engineers can access experiment analysis without leaving their workflow, improving user experience.

Users provide experiment ID, period, module, background, and expectations. A unified form collects these inputs and transparently forwards them to the various backend services, eliminating perceptible differences between tools.

Web form
Web form

3.3 Workflow Design

Workflow diagram
Workflow diagram

The initial planner provided few‑shot examples of historical analysis, allowing the LLM to suggest a list of analysis method calls. However, this approach suffered from inaccurate method parsing, inflexible calls, and inability to pass intermediate conclusions. It was replaced by a DAG‑based workflow execution framework that runs independent analyses in parallel and sequentially executes dependent steps, markedly improving analysis quality.

Future work includes accumulating high‑quality few‑shot examples, fine‑tuning with reinforcement learning, and eventually training a dedicated model capable of chain‑of‑thought reasoning.

Engineering Technology

Front‑End Unified Form

Vue 3 Framework: Developed with Vue.

Component Design: UI elements are split into reusable components.

Auto‑Complete: Integrated with the trial‑stone interface to fetch experiment name lists for form auto‑completion.

History Memory: Stored user form history in JIMDB using "erp+tool name" as the key.

Authentication: Implemented via JSSDK for JD ME client authentication and ERP retrieval.

Multi‑Auth Capability

Front‑End: JSSDK enables communication with JD ME for H5 page authentication.

Back‑End: Designed a mechanism coupling tool permissions with platform permissions to handle varied user scopes.

Auth architecture
Auth architecture

Message Interaction via JD ME

Message Update Integration: Using JD ME messaging service and JIMDB, a single message card can be dynamically updated based on Job_id, reducing message volume.

Dynamic Routing Framework: Supports both JD ME robot default callbacks and custom card update services without conflict.

Message interaction diagram
Message interaction diagram

Large Model

5.1 Model Selection

Due to data and information security, the selection is limited to JD’s proprietary Yanxi model and locally deployed DeepSeek R1/V3 models. R1 showed impressive reasoning but was highly unstable, with error amplification and severe hallucinations, making its conclusions unreliable. Consequently, a combined multi‑model approach (R1 + V3) was adopted, assigning tasks to the appropriate model.

Model hallucination comparison
Model hallucination comparison

While Gemini 2.5 Pro currently offers the best hallucination control, security constraints prevent its use; future plans include exploring open‑source models with distillation techniques to train a proprietary reasoning model.

5.2 Generation Quality

Prompt Engineering: Core to generation quality; dynamic few‑shot prompts are configured based on experiment attributes.

Instead of feeding raw structured data and metric explanations for the model to compute, the data is pre‑processed into descriptive text, reducing numeric hallucinations.

Mechanism Design: Implemented timeout and output‑quality checks with retry logic to handle long token inputs that sometimes produce no results, significantly lowering the no‑result rate.

Future Improvements

Knowledge‑Distilled AB Experiment Expert Model: Build a domain‑specific corpus and apply model distillation to replace the current R1 + few‑shot solution, enabling continuous learning from historical experiments, generating complete causal chain (CoT) plans, and providing more logical summary conclusions guided by experimental expectations.

More Flexible Data Engineering Framework: Move from a rigid all‑data‑preparation approach to on‑demand data fetching based on prior analysis conclusions and subsequent needs, e.g., integrating a minimum sample size and MDE analysis service into the workflow.

Product Interaction Enhancement: Transition from static one‑time reports to interactive, explorable analysis dialogues that reveal reasoning paths, visualized inference chains, and support follow‑up queries, thereby improving transparency, user engagement, and trust.

References

DeepSeek R1 vs V3 Hallucination Study

data engineeringworkflow automation
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.