Big Data 16 min read

From Writing SQL to Speaking Requirements: Practical Guide to DataWorks Data Agent

This article walks through using DataWorks Data Agent to automate end‑to‑end data‑warehouse development—from preparing source tables and a structured requirement document, uploading it, crafting task commands, selecting execution modes and models, to the agent generating SQL, building workflows, publishing them, and producing a final report—all without writing SQL manually.

DataFunTalk
DataFunTalk
DataFunTalk
From Writing SQL to Speaking Requirements: Practical Guide to DataWorks Data Agent

In 2026 large‑model applications have moved from a "trial" phase to deep‑water engineering, and enterprises now ask how to embed AI into data pipelines. DataWorks Data Agent is presented as an AI‑native data‑warehouse development assistant that can understand requirements, generate code, configure scheduling, and deliver production‑ready artifacts.

1. Data Preparation

If the user already has an ODS table with the same name or structure, this step can be skipped; otherwise, an ODS table containing real business data must be created.

2. Requirement Document Preparation

A detailed requirement document is required, covering business background, target metrics, data sources, definitions, update frequency, usage scenarios, and permission boundaries. This document becomes the single source of truth for data integration, modeling, development, monitoring, and application.

3. Uploading the Requirement Document

After uploading, the system automatically creates a reference tag @upload/需求文档.md. The document can be uploaded by selecting the local file or dragging it into the dialog.

4. Task Command Input

In the CLI or chat input, reference the uploaded document with @需求文档.md and then specify the five essential elements: target output, standards, scope, delivery format, and publish action. Example command:

请按照需求文档中的内容,开发 直播间商品成交数据 的应用层(ADS)表;需严格遵循数仓建设规范,同步完成 DWD 明细层、DWS 汇总层的建表及 ETL 开发;所有开发任务统一通过新建一个工作流(Workflow)来编排。本次任务开发全部使用MaxCompute。

5. Execution Mode Selection

Plan – only propose a solution, no file changes or execution.

Default – each file change or command requires manual confirmation.

Auto‑Edit – file changes are auto‑approved, shell commands still need confirmation.

YOLO – all operations, including shell commands, are auto‑approved.

Shortcut Shift + Tab cycles through the modes.

6. Model Selection

In chat mode, click the model name (default qwen3.7-max) to choose from Qwen or DeepSeek series. In CLI mode, use the /model slash command to list available models.

7. Agent Task Execution

After confirming the requirement document, task command, execution mode (Plan), and model (qwen3.7-max), the user sends the command. In YOLO mode the agent immediately starts autonomous execution, breaking the work into eight sub‑tasks displayed in the "待办事项" panel:

Inspect ODS table structures and fields.

Generate a complete development plan.

Create a new data‑development workflow.

Develop DWD layer nodes.

Develop DWS layer nodes.

Develop ADS layer nodes.

Configure scheduling dependencies.

Submit a release request.

The right‑hand "环境" panel shows the default workspace (E‑Commerce Analytics), while the "上下文" panel reports token usage and applied rules.

8. Detailed Sub‑Tasks

Task 1 – ODS Inspection : The agent invokes the dataworks skill to read two ODS source tables (product dimension and transaction fact) and retrieve full field metadata.

Task 2 – Development Plan : Using the user‑defined rules (e.g., data‑warehouse standards), the agent produces a document titled "直播间商品售卖 - 完整开发计划".

Task 3 – Create Nodes : The agent calls the dataworks-datastudio skill to generate four MaxCompute SQL nodes at once.

Task 4 – DWD Nodes : The agent runs the write_file tool to create DWD‑商品信息表.sql and DWD‑交易订单表.sql, each with CREATE TABLE statements, COMMENTed fields, and ETL logic that joins the product and order tables and filters only paid orders.

Task 5 – DWS Node : Generates DWS‑商品成交日汇总.sql with aggregation windows (1d, 7d, 30d, MTD) and metrics such as GMV, order count, buyer count, etc.

Task 6 – ADS Node : Generates ADS‑商品分析应用表.sql with lifecycle configuration and windowed metrics for daily, 7‑day, 30‑day, and MTD periods.

Task 7 – Workflow Creation : After the four SQL files are ready, the agent creates a workflow, sequentially adds each node, resolves dependencies (e.g., DWD‑商品信息 → DWD‑交易订单 → DWS → ADS), and validates the structure.

Task 8 – Workflow Publishing : In YOLO mode the agent automatically submits the release request, publishes the workflow with a daily 02:00 schedule, and outputs a final delivery list showing all tables, fields, and lineage (

dwd_ctlive_product_info → dwd_ctlive_trade_order → dws_ctlive_product_trade_1d → ads_ctlive_product_analysis

).

9. Report Generation and Session Info

The user can ask the agent to "把上面流流程作成html报告"; the agent analyses the conversation and produces an HTML report, which can be saved as PDF. Additional commands like /stats display session statistics (token usage, success rates, etc.).

Overall, DataWorks Data Agent transforms a traditionally multi‑person, multi‑day data‑warehouse development effort into a single‑person, AI‑driven workflow that delivers enterprise‑grade, production‑ready artifacts from natural‑language requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataworkflowData WarehouseDataWorksAI AutomationSQL GenerationData Agent
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.