Artificial Intelligence 26 min read

How AI Agents Turn Manual Data Retrieval into Fully Automated Insights

This article examines the challenges of manual data extraction in data‑driven enterprises, explains why large language models alone fall short, and details how the Cursor‑Agent framework automates end‑to‑end querying, knowledge‑base integration, and result validation to become a self‑sufficient "data master" for both technical and non‑technical users.

Youzan Coder
Youzan Coder
Youzan Coder
How AI Agents Turn Manual Data Retrieval into Fully Automated Insights

1. The Pain of Data Retrieval and LLM’s Limits

In a data‑driven era, obtaining accurate data quickly is essential, yet many teams spend excessive time writing SQL, handling permissions, and debugging queries.

LLMs such as ChatGPT can generate SQL drafts, but they lack execution ability, cannot handle runtime errors, struggle with context continuity, and have no knowledge of internal schemas.

1.1 Scheduling Needs

Pre‑release reviews require impact analysis (e.g., refund exchanges, GMV).

Operational metrics need specific transaction counts.

1.2 LLM Improves Efficiency but the "Last Mile" Remains

Lack of execution ability : SQL must still be copied and run manually.

Cannot handle real‑world feedback : permission errors, missing fields, date format issues.

Context fragmentation : iterative adjustments often require re‑phrasing prompts.

Isolation from data environment : LLMs know nothing about table structures or business logic.

What is needed is an intelligent agent that understands requirements, plans steps, interacts with the data environment, processes feedback, and completes tasks autonomously.

2. New Approach: End‑to‑End Automation with Cursor‑Agent

We built a complete Agentic Workflow using Cursor‑Agent to automate the process from requirement input to result output.

2.1 Video Demo

Demonstration using a test dataset shows the agent handling data requests.

Agent Data Permissions

Identity authentication : Strict personal account verification limits access to authorized scopes.

Multi‑layer security : Permission matrix, real‑time monitoring, audit logs, and data masking.

Zero‑privilege principle : Access only via standard APIs with no privilege escalation.

Compliance and traceability : All operations are logged for full auditability.

2.2 Core Advantages

Zero‑intervention for clear requirements : The agent parses natural‑language requests, writes and executes SQL, and even fixes minor syntax or format issues without human input.

Autonomous exploration for vague requests : By inspecting fields like pay_tool_id and pay_tool_table_name , the agent discovers payment method tables and resolves missing definitions.

Remarkable efficiency gains : Complex queries that previously took 4‑6 hours are completed in 10‑15 minutes, enabling non‑technical users to explore data directly.

3. Overall Architecture and Core Components

3.1 Architecture Overview

Highly integrated AI for full automation.

Cursor‑Agent workflow: requirement analysis → knowledge search → table discovery → SQL generation → execution → result observation → iteration.

Loop repeats until the requirement is satisfied.

3.2 Cursor‑Agent Introduction

In Agent mode, Cursor uses all available tools to fulfill complex programming tasks.

Process: analyze → plan → execute → evaluate → repeat.

Built‑in tools: web access, CLI execution, file I/O, code repository reading.

3.3 The Soul: .cursorrules (Instruction & Workflow Engine)

.cursorrules defines domain knowledge, preset rules, instructions, workflows, and best‑practice tips that guide the agent’s actions.

<code># role
# workflow
# instruction
# checkpoint-validation
# troubleshooting-guide
# performance-tips
# tool-use
# example</code>

3.4 Local Tools (Node.js Scripts)

query(sql_statement) : Executes SQL via HTTP and returns CSV or error.

table-info(table_name) : Returns field list, types, comments.

metadata(keyword) : Searches metadata for tables related to a business keyword.

3.5 Knowledge Base

Compensates for missing data definitions in the platform, providing field value domains, JSON structure explanations, and table‑keyword mappings.

Example: payment_method int → 1=WeChat, 2=Alipay, …

Example: order_extra text → JSON schema of extra fields.

4. Our Exploration and Evolution

Iterative Highlights (V0.1 → V0.4)

v0.1 : Added query tool; could generate and run simple SQL but lacked autonomous exploration.

v0.2 : Introduced cursorrules , table-info , metadata ; achieved fully automated loops, though runtime was long.

v0.3 : Added knowledge‑base files; improved efficiency and achieved zero‑intervention for clear requests.

v0.4 : Refined cursorrules for better control; stability increased, ready for broader internal use.

Core Challenge: Agent Controllability

The agent must follow prescribed rules; otherwise, random behavior can produce incorrect results or endless loops.

Solution: Use cursorrules as system prompts that enforce strict workflow and provide concrete examples.

5. Outlook: Productization Possibilities

AI agents will evolve from data‑retrieval tools to comprehensive business workflow assistants, enabling natural‑language interactions, deep chart analysis, and actionable decision recommendations.

Shift from UI clicks to conversational queries.

Enable follow‑up questions on any chart.

Provide not just insights but concrete action plans.

Future productization will require moving beyond Cursor‑Agent to generic LLM APIs or open‑source frameworks, addressing model adaptation, prompt optimization, context storage, and knowledge‑base integration.

6. Reflection: Technological Change Anxiety and Excitement

Rapid AI advances can render months of engineering effort obsolete within weeks, turning previous advantages into constraints.

Key takeaways: maintain technical sensitivity, prioritize rapid learning, and focus on adaptable problem‑solving skills rather than static tech stacks.

7. Appendix

cursorrules – workflow

<code># workflow
必须严格按顺序执行以下步骤,不得跳过任何步骤:
1. 需求解构:提取业务对象、筛选条件、时间范围、字段需求,使用表格列出。
2. 业务知识搜索:在 domain‑knowledge.md 中搜索关键表和字段。
3. 信息汇总与缺口识别:列出已知信息和未知信息并标记优先级。
4. 数据探索‑表结构:使用 `node index.js table-info "table_name"` 获取表信息。
5. 数据探索‑数据验证:执行时间范围查询并验证关键字段。
6. 查询构建‑单表筛选:逐步添加条件并验证。
7. 查询构建‑表关联:记录行数,确保小表在 FROM,大表在 JOIN。
8. 查询构建‑聚合计算:在必要时使用 GROUP BY 并验证。
9. 结果完整性验证:检查维度覆盖和异常值。
10. 最终查询与验证:为最终 SQL 添加业务注释。</code>

cursorrules – example

<code>## 1. 需求解构
|组成部分|内容|是否需要确认|
|---|---|---|
|业务对象|换货订单|需要确认|
|筛选条件|储值支付|需要确认|
|时间范围|2024年每月|已知|
|需求指标|订单数量|已知|

## 2. 业务知识搜索
执行: `grep -i "换货\|储值\|支付\|订单" domain-knowledge.md`

## 3. 信息汇总与缺口识别
已知信息: 换货订单表、订单主表、时间范围。
未知信息: 储值支付判断条件、时间字段、表关联方式。

## 4. 数据探索‑表结构
执行: `node index.js table-info "dw.order"`
执行: `node index.js table-info "dw.refund"`
执行: `node index.js table-info "ods.exchange"`

## 5. 数据探索‑数据验证
执行: `node index.js query "SELECT tc_pay_tool_id, tc_pay_tool_name FROM dw.pay_tool WHERE tc_pay_tool_name LIKE '%储值%' LIMIT 10"`

## 6. 查询构建‑单表筛选
执行: `<<SQL>>`

## 7. 查询构建‑表关联
执行: `<<SQL>>`

## 8. 查询构建‑聚合计算
执行: `<<SQL>>`

## 9. 结果完整性验证
-- 2024年每月使用储值支付的换货订单数量统计

## 10. 最终查询与验证
-- 最终查询 SQL,使用中文列名并分页。</code>

cursorrules – tool‑use

<code>## query
> 执行 SQL 并保存结果到 CSV。
* base case: `node index.js query "SELECT * FROM dw.orders LIMIT 10"`
* 中文列名安全别名: `node index.js query "SELECT field1 AS col1 FROM table" --safe-alias '{"col1":"中文名1"}'`
* key‑set分页: `node index.js query "SELECT * FROM table WHERE id > #{id} ORDER BY id" --key-set-by 'id' --max-rows 3000`

## table‑info
* 获取表结构: `node index.js table-info "dw.orders"`

## metadata
* 按业务关键词搜索表名: `node index.js metadata "订单"`

## viz‑sql
> 根据查询结果生成摘要并创建可视化图表。
* 基本用法: `node index.js viz-sql <CSV> --group-by <field> --measure <field> --aggregation sum --type bar`
* 自定义图表: `node index.js viz-sql <CSV> --group-by <field> --measure <field> --aggregation avg --type line --title "标题" --x-axis-title "X轴" --y-axis-title "Y轴"`</code>
LLMKnowledge BaseAI AgentSQL generationCursor-AgentData Automation
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.