How DataAgent Turns AI into a Virtual Data Analyst for Enterprise Insights

DataAgent, built on Spring AI Alibaba, tackles the "last mile" of AI data analysis by combining deterministic workflow orchestration with large‑model reasoning, offering human‑in‑the‑loop feedback, dynamic prompt configuration, hybrid retrieval, containerized Python execution, streaming SSE, multi‑model scheduling, multi‑source connectivity, and secure API‑key management to deliver instant, insight‑rich reports for business users.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How DataAgent Turns AI into a Virtual Data Analyst for Enterprise Insights

Background

Enterprise users often face a "data abyss" where business analysts cannot write complex SQL and developers cannot rely on unreliable Text‑to‑SQL results. DataAgent, built on the Spring AI Alibaba Graph & Agent Framework, provides a virtual AI data analyst that can plan, invoke tools, self‑correct, and accept human feedback.

Architecture Overview

DataAgent composes deterministic graph‑based processes with LLM inference using the Spring AI Alibaba Graph & Agent Framework. The system is modular and extensible, supporting multi‑agent execution, tool integration, and runtime configuration.

Key Technical Innovations

Human‑In‑The‑Loop (HITL) Execution plans that may affect production databases can be paused. Setting humanFeedback=true routes the graph through HumanFeedbackNode , which waits for user approval before continuing.

Dynamic Prompt Configuration & Auto‑Optimization Prompt templates (e.g., report-generator , planner , sql-generator , python-generator , rewrite ) can be updated at runtime via REST endpoints /api/prompt-config/* . The helper method PromptHelper.buildReportGeneratorPromptWithOptimization automatically appends optimization directives based on configuration fields such as priority and display_order .

Deep Retrieval‑Augmented Generation (RAG) & Hybrid Retrieval The EvidenceRecallNode rewrites user queries using multi‑turn context, then performs hybrid retrieval through AbstractHybridRetrievalStrategy , which fuses vector similarity and keyword matching. Filtering is applied by DynamicFilterService based on knowledge types ( business_knowledge , agent_knowledge ) and metadata ( agentId/type ).

Containerized Python Execution Engine When analytical tasks require Python (e.g., charting, regression), PythonGenerateNode creates the script, and PythonExecuteNode runs it inside a Docker container (default image continuumio/anaconda3:latest ) via CodePoolExecutorService . Results are written to PYTHON_EXECUTE_NODE_OUTPUT and later merged with SQL results ( SQL_EXECUTE_NODE_OUTPUT ) for final reporting.

Streaming Output (SSE) & Multi‑Turn Dialogue Partial results are streamed to the client using Server‑Sent Events through GraphController and GraphServiceImpl . Content types are marked with TextType for front‑end rendering. Conversation state is managed by MultiTurnContextManager , and the LLM service mode can be switched between STREAM and BLOCK via spring.ai.alibaba.data-agent.llm-service-type .

MCP Server & Multi‑Model Scheduling The McpServerService boot starter exposes NL2SQL and agent‑tool APIs. Model configurations are defined in ModelConfig and cached in AiModelRegistry , allowing hot‑swap of chat or embedding models at runtime. Built‑in tools include nl2SqlToolCallback and listAgentsToolCallback .

Multi‑Data‑Source Integration Metadata tables ( datasource , agent_datasource , agent_datasource_tables , logical_relation ) store connection info, agent bindings, and logical foreign keys. The AccessorFactory creates an Accessor backed by DBConnectionPool to handle dialect‑specific queries. At runtime, DatabaseUtil selects the active datasource for the current agent, ensuring only one datasource is active per agent ( AgentDatasourceService.toggleDatasourceForAgent ).

API Key & Permission Management API keys are managed via AgentController fields agent.api_key and agent.api_key_enabled . Requests must include the X-API-Key header when spring.ai.alibaba.data-agent.api-key.enabled=true is set, providing production‑grade access control.

Outcomes

DataAgent delivers instant, insight‑driven reporting without service restarts. It combines AI‑augmented RAG, automated prompt tuning, containerized analytics, real‑time streaming, multi‑model orchestration, cross‑source querying, and enterprise‑grade security into a single framework.

Relevant Resources

GitHub repository: https://github.com/spring-ai-alibaba/DataAgent

Spring AI Alibaba library: https://github.com/alibaba/spring-ai-alibaba

AnalyticsAIAutomationLLMSpringAIDataAgent
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.