How an AI‑Powered Experiment Analysis Agent Transforms Data Insights

This article outlines the motivation, design, architecture, and engineering of an AI-driven experiment analysis agent, detailing its modular workflow, large‑model selection, prompt engineering, front‑end form integration, and future enhancements to improve reliability, transparency, and user interaction.

JD Retail Technology
JD Retail Technology
JD Retail Technology
How an AI‑Powered Experiment Analysis Agent Transforms Data Insights

Background

Traditional algorithm experiments focus on core metrics such as UCTR and UCVR, but solely improving positive indicators is insufficient; deeper data analysis is needed to uncover hidden risk factors and assess trade‑offs. Experiments often yield diverse or contradictory results, indicating underlying causal relationships that must be understood to optimize evaluation systems and strategy iteration.

Existing analysis tools are fragmented across platforms, lacking guided processes, which inspired the development of an Experiment Analysis Agent.

Product Showcase

Report Example

Data has been anonymized for security.

Product Design

3.1 Architecture Design

A top‑down analysis approach starts from a macro view, drills down after identifying anomalies, and validates findings at finer granularity. Inspired by the AI Agent Manus, which decomposes queries into task lists, performs information retrieval, and generates structured summaries, a "summarize‑data → sub‑analysis → present‑summary" framework was devised.

To minimize cost and development time, existing reporting tools were reused for core data engineering, while new analysis capabilities (e.g., FDR analysis) were developed independently and later integrated.

The overall architecture adopts a modular, layered design to achieve a full‑process intelligent analysis loop:

Application layer integrates analysis planning, theme analysis (via DAG workflow execution), and conclusion aggregation, powered by DeepSeek R1/V3 models.

Service layer provides authentication, front‑end form services, using JDSDK + CHO‑JSF for high availability.

Data layer leverages Doris‑X, ClickHouse, Spark to build both real‑time (OLAP) and offline (BDP) gateways, reusing existing metric systems while adding independent FDR modules.

Agent architecture diagram
Agent architecture diagram

3.2 Product Design

The agent aims to be a true "assistant" by integrating with JD's internal ME robot, allowing researchers to access analysis without leaving their workflow, thus enhancing user experience.

A unified front‑end form was created to collect experiment ID, period, module, background, and expectations, transmitting key information to various backend services while preserving a seamless experience.

Unified form illustration
Unified form illustration

3.3 Workflow Design

The initial planner used few‑shot examples to let LLM suggest analysis method lists, but suffered from inaccurate parsing, inflexible calls, and inability to pass intermediate conclusions.

It was upgraded to a DAG‑based workflow orchestration framework, enabling parallel execution of independent analyses and serial execution for dependent steps, markedly improving analysis quality.

Future plans include accumulating high‑quality few‑shot samples, fine‑tuning a specialized LLM with reinforcement learning to produce coherent chain‑of‑thought experiment plans.

Engineering

Front‑end Unified Form

Built with Vue 3 framework.

Component‑based design for reusable UI elements.

Auto‑completion from internal experiment name API.

Historical memory via JIMDB keyed by ERP and tool name.

Authentication via JSSDK.

Multi‑Level Authentication

Front‑end uses JSSDK for ME client authentication; back‑end combines tool permissions with platform permissions to ensure proper access control.

Authentication architecture
Authentication architecture

Large Model

5.1 Model Selection

Due to data and information security, only JD's proprietary Yanxi model and locally deployed DeepSeek V3/R1 models were considered.

R1 showed occasional brilliance but was unstable and prone to hallucinations on complex tasks; a combined R1 + V3 multi‑model approach was adopted.

Model hallucination comparison
Model hallucination comparison

5.2 Generation Quality

Prompt Engineering : Dynamic few‑shot prompts based on experiment attributes produce tailored templates. Converting structured data and metric explanations into descriptive text before feeding the model reduces numeric hallucinations.

Mechanism Design : Implemented timeout and output quality checks with retry logic, significantly lowering no‑result rates for long inputs.

Future Improvements

Knowledge‑Distilled Expert Model : Using domain‑specific experiment corpora and model distillation to replace the current R1 + few‑shot solution, delivering complete causal chain‑of‑thought planning.

Continuously absorb analysis patterns from historical experiments.

Generate coherent CoT experiment plans.

Provide logical conclusion summaries guided by experimental expectations.

More Flexible Data Engineering Framework : On‑demand data retrieval based on preceding analysis conclusions, e.g., offering minimal sample size and MDE analysis when metrics are non‑significant.

Product Interaction Enhancements : Move from static one‑time reports to interactive, transparent analysis dialogues that expose reasoning paths, visualized inference chains, and support iterative questioning.

ArchitectureAIProduct Designprompt engineeringExperiment analysis
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.