Big Data 9 min read

Master DataWorks Notebook: Interactive SQL & Python for Big Data Development

This guide walks you through setting up a personal DataWorks Notebook, performing interactive SQL development with engines like MaxCompute, creating Python visualizations, building ipywidgets for dynamic queries, and leveraging the AI‑powered Copilot to rewrite, explain, and comment code, all within a unified big‑data platform.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Master DataWorks Notebook: Interactive SQL & Python for Big Data Development

Introduction

DataWorks is an all‑in‑one intelligent big‑data development and governance platform that integrates with Alibaba Cloud services such as MaxCompute, EMR, Hologres, Flink and PAI, providing end‑to‑end data integration, AI‑enabled development, analysis and proactive data‑asset governance for the full Data+AI lifecycle.

Operation Tutorial

Product Activation

1. Ensure you have a DataWorks subscription; purchase if necessary and select the region (e.g., East China 1 – Shanghai).

2. Log in to the DataWorks console, choose DataWorks Gallery under Big Data Experience , and click the “Notebook Quick Start” case to load.

Create Personal Development Environment

1. In the case‑loading dialog, click “Create Workspace”, provide a workspace name, and create a resource group if required.

2. Create a personal development instance, name it, bind it to the resource group, and allocate at least 2 CU.

3. After the instance status becomes “Running”, open it and create a new Notebook.

Application Experience

(1) Interactive development with MaxCompute

In a SQL Cell, select the desired big‑data engine, bind the compute resource (e.g., MaxCompute), create a project, and run the following example query:

SELECT 'James' AS name, '25' AS age, 'Hangzhou' AS city;

Execute the cell to view the result. The same pattern can be used with other engines such as Hologres, EMR Spark, StarRocks, and Flink (see the Flink configuration snippet below).

-- @conf name = flink_vvp_job_quick_start
-- @conf engineVersion=vvr-8.0.8-flink-1.17
-- @conf flinkConf."execution.checkpointing.interval"=10second
-- @conf flinkConf."taskmanager.numberOfTaskSlots"=4
-- @conf flinkConf."table.exec.state.ttl"=1hour
-- @conf streamingResourceSetting.resourceSettingMode=BASIC
-- @conf streamingResourceSetting.basicResourceSetting.parallelism=4
-- @conf streamingResourceSetting.basicResourceSetting.taskmanagerResourceSettingSpec.memory=4GiB
-- @conf streamingResourceSetting.basicResourceSetting.taskmanagerResourceSettingSpec.cpu=1
-- @conf streamingResourceSetting.basicResourceSetting.jobmanagerResourceSettingSpec.memory=4GiB
-- @conf streamingResourceSetting.basicResourceSetting.jobmanagerResourceSettingSpec.cpu=1
CREATE TEMPORARY TABLE datagen_source(
  randstr VARCHAR
) WITH (
  'connector' = 'datagen'
);
CREATE TEMPORARY TABLE print_table(
  randstr VARCHAR
) WITH (
  'connector' = 'print',
  'logger' = 'true'
);
INSERT INTO print_table
SELECT SUBSTRING(randstr,0,8) FROM datagen_source;

(2) Data analysis with Python

In a Python Cell, you can write standard Python code. The example below creates a bar chart:

import matplotlib.pyplot as plt
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [23, 45, 17, 56]
plt.figure(figsize=(10, 6))
plt.bar(categories, values, color=['blue','green','red','purple'])
plt.title('Sample Bar Chart')
plt.xlabel('Categories')
plt.ylabel('Values')
for i in range(len(values)):
    plt.text(i, values[i], str(values[i]), ha='center', va='bottom')
plt.show()

The chart is displayed as shown in the accompanying image.

Interactive Analysis

By using ipywidgets, you can build interactive sliders in a Python Cell and reference the generated variable in SQL cells, enabling dynamic queries such as:

SELECT '${query_age}' AS age;

Intelligent Assistant – Copilot

DataWorks Copilot can rewrite, explain, and comment SQL code, as well as generate table definitions. Example commands include:

Rewrite a SELECT statement using UNPIVOT.

Explain a PIVOT query.

Generate comments for each column in a CREATE TABLE statement.

Copilot also supports “smart table creation” directly from the Data Studio interface.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataPythonSQLDataWorksnotebookCopilotInteractive Development
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.