Big Data 12 min read

How AI‑Powered Skills Cut 70% of Repetitive Data Development Work

A real‑world incident where an ADS table stopped updating triggered a three‑second root‑cause discovery and a three‑hour data‑warehouse rebuild using a Claude‑based Skill that eliminated about 70% of the manual, repetitive steps traditionally required in data development, testing, deployment, and operations.

dbaplus Community
dbaplus Community
dbaplus Community
How AI‑Powered Skills Cut 70% of Repetitive Data Development Work

Last week an ADS table in the data warehouse stopped updating for a week, causing the downstream search service to lose its data source. The traditional troubleshooting path required navigating DataWorks console, checking logs, tracing upstream failures across three systems and seven pages.

Using a Claude Code Skill, the author inspected the schedule status, identified that the ADS report daily partition was stale for seven days, traced the upstream ODS tables to zero partitions, and discovered the root cause: the data source type was registered as rds instead of postgresql . This dialog‑based investigation took only three seconds.

With the Skill, the entire data‑warehouse pipeline was rebuilt in three hours, eliminating roughly 70% of repetitive development work.

1. Where does data development time go?

Checking table schema, data, and schedule requires opening five separate tabs.

After writing ETL, developers must create tables, nodes, and dependencies via console forms.

Deploying a node involves eight clicks; three nodes mean 24 steps.

Post‑ETL validation requires writing COUNT, COUNT DISTINCT, null‑rate, and cross‑layer comparison SQL each time.

Changing a cron expression for 35 nodes takes over an hour in the console.

Consequently, less than 30% of a developer’s time is spent writing SQL.

2. Using the Skill in the Development Phase

Schema inspection is now a single command:

/mc-schema ods_order_main
→ ods_order_main (28 cols, partition: ds, latest: ds=20250406)
order_id: STRING
user_id: STRING
status: STRING
channel: STRING
pay_amount: DECIMAL
created_at: DATETIME

Cross‑database source verification can be done without a client:

/pg-query biz_db SELECT channel, COUNT(1) FROM orders WHERE status='paid' GROUP BY channel ORDER BY 2 DESC
→ channel | count
app   | 328,106
h5    | 215,773
mini  | 134,529

Running local SQL directly on MaxCompute eliminates the need to copy ETL scripts to the console:

/mc-run-sql etl_dwd_order_detail.sql 20250405
→ Executing... Instance: 2025040502xxxx
→ Completed! Duration 47 seconds
→ Total rows: 2,847,563
→ channel_app: 1,203,421 | channel_h5: 876,330 | channel_mini: 512,108
→ channel_pc: 255,704

The Skill’s prompt core includes:

Read local SQL file {file_path}, replace $\{bizdate\} with {bizdate},
remove SET statements (passed via hints),
call PyODPS execute_sql and wait for completion,
automatically run SELECT source, COUNT(1) ... GROUP BY source to output row distribution.

3. Testing Phase

A single command runs comprehensive data‑quality checks:

/mc-dq ads_report_daily 20250405
→ Total rows: 2,847,563
→ Primary key report_key: 2,847,563 unique, 0 duplicates ✅
→ Compared to previous period (2,831,207): +0.6% ✅
→ Field null‑rates:
   user_id: 100.0% ✅
   channel: 100.0% ✅
   pay_amount: 99.8% ✅
   city: 97.2% ✅
   device_type: 85.3%
   coupon_id: 32.1%
→ Cross‑layer consistency: DWS = ADS = 2,847,563 ✅

The Skill also supports root‑cause analysis within the same dialog. For example, querying why coupon_id null‑rate is only 32% yields the explanation that only orders participating in promotions have a coupon ID, matching the operational coverage.

4. Deployment Phase

Multiple nodes can be deployed with a single command:

/dw-deploy etl_dwd_order_detail etl_dws_order_agg etl_ads_report_daily
→ Updating 3 node SQL...
→ Submitting...
→ Deploying... 3/3 ✅

The underlying API chain executed by the Skill is:

# Skill actual API calls
for file_id in file_ids:
    client.update_file(file_id, content=sql)   # update SQL
    client.submit_file(file_id)               # submit
    time.sleep(8)                            # wait for pipeline
    client.deploy_file(file_id)               # deploy

Eight manual clicks per node (24 clicks for three nodes) are reduced to a single line command.

5. Operations Phase

Daily status check:

/dw-status
→ === DI Sync (9) ===
   All Running ✅
→ === Link A (Rule Engine) ===
   DWD → DWM → DWS → ADS: SUCCESS ✅
→ === Link B (Search Service) ===
   DWD(03:31) → DWS(04:01) → ADS(04:31): SUCCESS ✅
→ Data freshness: ds=20250406 ✅

When a failure occurs, logs are inspected and the job is rerun in a single window:

/dw-log etl_ads_report_daily 20250405
→ FAILED: column pay_time cannot be resolved
→ Reason: upstream source field renamed from pay_time to paid_at
… (fix SQL) …
/dw-deploy etl_ads_report_daily
/dw-rerun etl_ads_report_daily 20250405
→ SUCCESS (32 seconds) ✅

6. Three‑Hour End‑to‑End Rebuild

From a real business scenario where the search service’s data‑warehouse link was flawed, the author performed:

Analysis: /pg-query + /mc-schema to quickly scan table structures and data distribution.

Development: wrote three ETL SQL files (~900 lines) and used /mc-ddl to create 12 tables.

Verification: /mc-run-sql layered execution and /mc-dq full data‑quality check, achieving zero duplication and zero inflation on million‑row data.

Deployment: /dw-deploy batch deployment, /dw-set-deps for dependency chains, /dw-update-cron to unify schedules, /dw-create-dijob for sync tasks, /dw-offline to clean obsolete nodes.

7. How to Build Your Own Skill

Three steps:

Identify high‑frequency manual operations (e.g., schema lookup, schedule status, node deployment, validation SQL).

Find the corresponding OpenAPI (DataWorks, MaxCompute) – ListInstances, CreateDIJob, SubmitFile, DeployFile, etc.

Write a Skill prompt that defines what to do, which APIs to call, and how to format the output.

Example prompt for a status check:

/dw-status
Check PROD schedule status:
1. ListDIJobs for sync task status
2. ListInstances for recent two‑day instances (grouped by link)
3. Verify latest MaxCompute table partition
4. Output grouped by link with anomalies highlighted

The AI handles parameter assembly, pagination, retries, and result formatting; the user only specifies the desired outcome.

8. Final Thoughts

The core value of data development lies in understanding business, designing models, and ensuring quality, not in endless console clicks. When schema lookup, validation, and deployment are reduced to single commands like /mc-schema, /mc-dq, and /dw-deploy, developers can focus on questions such as whether the table granularity is appropriate or if the pipeline can sustain future demand.

The evolution of tools should always bring us back to thinking, not clicking.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data QualityData WarehouseETLClaudeAI automationData Development
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.