Artificial Intelligence 35 min read

Building a Fully Autonomous AI Data Analyst: Agent Architecture & Planning

This article explores how to create a self‑thinking AI data analyst by detailing agent fundamentals, core modules such as planning, memory and tool scheduling, practical development steps, multi‑agent collaboration, evaluation benchmarks, and real‑world examples like stock backtesting.

Tencent Cloud Developer

Nov 18, 2025

Building a Fully Autonomous AI Data Analyst: Agent Architecture & Planning

Agent Overview

An AI Agent (or autonomous agent) is a system that can perceive its environment, make decisions, and execute tasks to achieve specific goals. In the context of data analysis, an agent acts as a virtual analyst that can retrieve data, run queries, perform calculations, and generate reports without writing code.

Core Architecture and Modules

1.1 What Is an Agent?

An agent consists of four core capabilities:

Environment perception – acquiring multimodal data through sensors or APIs.

Intelligent decision‑making – using deep learning or reinforcement learning to choose actions.

Task execution – invoking tools or APIs to perform concrete operations.

Continuous learning – online or transfer learning to improve over time.

1.2 Agent Framework

The agent’s brain is a large language model (LLM) that provides planning, reasoning, and language generation. Supporting modules include:

Planning : decomposes complex tasks, creates execution plans, and coordinates subtasks.

Memory System : short‑term, mid‑term, and long‑term stores to overcome the limited context window of LLMs.

Tool/Function Scheduling : uses function‑calling or MCP protocols to invoke external services.

1.3 Agent Classification

Agents can be grouped into four paradigms:

Reflection (e.g., ReAct, Self‑Refine) – the model iteratively reflects on its actions.

Tool Use – the model calls external functions or APIs.

Planning – the model creates a hierarchical plan before execution.

Multi‑Agent Collaboration – multiple agents coordinate via protocols such as A2A.

Planning Module

Planning improves efficiency and accuracy for complex tasks. Key techniques include Hierarchical Task Networks (HTN) and Monte‑Carlo Tree Search. A typical planning loop follows the ReAct pattern: thought → action → observation → answer , similar to the PDCA cycle.

Memory System

Memory addresses the LLM’s limited context window by storing and retrieving information at three levels:

Short‑Term Memory (STM) : recent dialogue directly embedded in the prompt.

Mid‑Term Memory (MTM) : summarized topic chunks, often using RAG techniques.

Long‑Term Memory (LTM) : persistent knowledge bases or vector databases accessed via retrieval‑augmented generation.

Tool / Function Scheduling

Function calling transforms natural‑language intent into executable API calls. It solves two major problems: outdated knowledge (e.g., asking about 2025 events) and lack of real‑world actions. Typical workflow:

User request → LLM decides whether a tool is needed.

If needed, LLM outputs a function_call with name and parameters.

The host system validates permissions and executes the function.

Result is fed back to the LLM, which generates a final natural‑language answer.

Common pitfalls include missing logs, connection interruptions, performance bottlenecks, and inconsistent output schemas.

Example Code (Python)

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("API_KEY"),
    base_url="BASE_URL",
)

def function_calling(messages, tools):
    completion = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        tools=tools,
    )
    print("Response:")
    print(completion.choices[0].message.model_dump_json())
    return completion

MCP (Multi‑Channel Protocol)

MCP standardises tool registration, invocation, and result handling. It separates the host (application) from the tool server, providing security checks and unified JSON schemas. The workflow is:

User → Host sends request + tool list to MCP client.

MCP client asks LLM to generate a tool‑call plan.

Host approves the plan (permission, risk checks).

Host forwards the call to the MCP server, which executes the tool.

Result travels back through MCP server → Host → MCP client → LLM → User.

Evaluation Benchmarks

Current evaluation of agents relies on a variety of benchmarks covering data analysis, multi‑turn dialogue, and tool use. Representative datasets include:

AgentBench (arXiv:2308.03688)

InfoQuest (arXiv:2502.12257)

MINT (2023)

ToolBench, GTA, ToolDial, WorkBench, DataSciBench, etc.

Practical Example: Stock Backtesting

The article demonstrates the agent’s capability with a stock‑backtesting scenario, where the agent automatically retrieves market data, runs a simulation, and produces a concise report—all without user‑written code.

Key Takeaways

Agent effectiveness depends on the synergy of planning, memory, and tool scheduling.

Context engineering (prompt design, KV‑cache optimisation) is crucial for latency and cost.

Dynamic constraints (logits masking, state‑machine management) keep tool selection tractable.

File‑system extensions and layered feedback improve scalability for long inputs.

Human‑in‑the‑loop should be triggered at well‑defined risk thresholds.

References

[1] AgentBench: Evaluating LLMs as Agents. arXiv:2308.03688.

[2] InfoQuest: Evaluating Multi‑Turn Dialogue Agents for Open‑Ended Information Seeking. arXiv:2502.12257.

[3] MINT: A New Benchmark Tailored for LLMs' Multi‑Turn Interactions. Blog post.

[4] ToolBench: An Instruction‑tuning Dataset for Tool Use. Papers With Code.

[5] GTA: A Benchmark for General Tool Agents. arXiv:2407.08713.

[6] ToolDial: Multi‑turn Dialogue Generation Method for Tool‑Augmented LMs. arXiv:2503.00564.

[7] AgentBoard: An Analytical Evaluation Board of Multi‑turn LLM Agents. NeurIPS 2024.

[8] WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting. arXiv:2405.00823.

[9] DataSciBench: An LLM Agent Benchmark for Data Science. arXiv:2502.13897.

[10] A Survey of Large Language Models.

[11] Understanding the Planning of LLM Agents: A Survey. arXiv:2402.02716.

[12] MemoryOS.

[13] AutoGen: Enabling Next‑Gen LLM Applications via Multi‑Agent Conversation.

[14] LangChain documentation.

[15] Context Engineering Survey for LLMs. arXiv:2507.13334.

[16] MetaGPT GitHub repository.

[17] CrewAI GitHub repository.

[18] BabyAGI GitHub repository.

[19] LlamaIndex documentation.

[20] Semantic Kernel GitHub repository.

[21] Auto‑GPT GitHub repository.

[22] AgentGPT website.

[23] Langroid GitHub repository.

[24] Haystack documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MCP ai-agent evaluation tool use Planning Memory System

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.