Quickly Analyze Public Big Data Sets with Alibaba DataWorks & MaxCompute (Free Trial)
This step‑by‑step tutorial shows how to set up Alibaba Cloud DataWorks and MaxCompute, bind them together, and use free trial resources to explore public big‑data datasets such as Alibaba e‑commerce, Github events, and custom data with SQL queries and visualizations.
Environment Preparation
This tutorial uses public big‑data AI datasets (Taobao, Feizhu, Alibaba Music, Github, TPC, etc.) and demonstrates rapid analysis with Alibaba Cloud DataWorks and MaxCompute free trial.
Enable DataWorks : Choose the Shanghai region and activate the DataWorks free trial (https://free.aliyun.com/?pipCode=dide). If the free trial is unavailable, switch to pay‑as‑you‑go (https://common-buy.aliyun.com/?commodityCode=dide_create_post).
Enable MaxCompute : Choose the Shanghai region and activate the MaxCompute free trial (https://free.aliyun.com/?crowd=personal). If unavailable, use pay‑as‑you‑go (https://common-buy.aliyun.com/?commodityCode=odps).
Create a DataWorks workspace and bind MaxCompute : Visit the DataWorks console (https://dataworks.console.aliyun.com/welcome), create a workspace, and bind the MaxCompute engine.
Bind the MaxCompute compute engine.
Start Analysis
Go to the DataWorks data analysis page (https://da-cn-shanghai.data.aliyun.com/#/query). If the left‑hand catalog shows no datasets, delete or re‑add the directory.
Alibaba E‑Commerce Dataset Analysis
This dataset comes from the Tianchi Alibaba Mobile Recommendation Challenge, containing 1 million anonymized real product records (over 1.2 billion rows). It allows you to experience Alibaba Cloud big‑data analysis capabilities across product, operation, and time dimensions.
Open the default SQL file on the welcome page, select MaxCompute as the execution engine, run the query, and view the results and automatically generated charts.
Github Event Dataset Analysis
GitHub records massive events generated by developers during open‑source project development, such as starring, committing code, etc. The public event dataset includes event type, details, developer, repository, and more.
Open the Github event dataset, view the SQL example file, select MaxCompute as the execution engine, run the query, and view results.
Custom Dataset Analysis
Click any table to open its detail page and view field information.
Generate SQL statements, run them for data preview, then create a new SQL file to write custom queries for free analysis.
When analyzing with MaxCompute, enable the three‑layer model:
SET odps.namespace.schema = true; --Enable MaxCompute three‑layer modelFurther Experience
Beyond data analysis, DataWorks also offers data modeling, integration, development, scheduling, operations, mapping, quality, governance, security, and services, helping enterprises quickly build big‑data platforms. See the documentation "Retail E‑Commerce Data Warehouse Construction" at https://help.aliyun.com/document_detail/461446.html.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
