Sync MaxCompute Tables to Milvus with DataWorks: Step‑by‑Step Guide
This guide explains how to use Alibaba Cloud DataWorks to create the necessary resources, configure Milvus and MaxCompute data sources, set up an offline single‑table synchronization task, and verify the imported vectors, enabling efficient AI‑driven vector search on large structured datasets.
01 Background Introduction
In modern big‑data and AI scenarios, enterprises often need to vectorize large‑scale structured data stored in cloud data warehouses such as MaxCompute to support efficient vector retrieval and similarity analysis. Alibaba Cloud Milvus‑based vector search service provides a fully managed, compatible engine that scales for massive AI vector data. Integrating MaxCompute with Milvus enables use cases like e‑commerce behavior analysis, medical knowledge bases, and game content recommendation.
02 Prerequisites
Create a Milvus instance – see the provided link.
Create a MaxCompute project – see the provided link.
Prepare DataWorks environment: create a workspace, bind the required DataWorks resource group, and ensure exclusive data‑integration resources are available.
03 Operation Process
Step 1: Data Preparation
Prepare test data in MaxCompute. Example table creation and data insertion:
CREATE TABLE dl_1216.`default`.mc_table (
id INT,
namespace STRING,
vector ARRAY<DOUBLE>
);
INSERT INTO dl_1216.`default`.mc_table VALUES(100, 'aaa', array(1554047123.0, 1554047123.0));
INSERT INTO dl_1216.`default`.mc_table VALUES(200, 'bbb', array(1554047999.0, 1554047999.0));
SELECT * FROM dl_1216.`default`.mc_table;Step 2: Add Data Sources
Create Milvus data source – select Milvus as the source type and configure basic information.
Create MaxCompute data source – select MaxCompute as the source type and configure basic information.
Test connectivity for each data source; a "connected" status indicates successful creation.
Step 3: Configure Synchronization Task
Select Synchronization Task in the left navigation.
Click New Sync Task , choose Single Table Offline as the sync type, and confirm.
In the task wizard, set source (MaxCompute) and destination (Milvus) data sources, configure field mappings (source fields must match target fields), and adjust channel parameters.
After configuration, run the task and monitor logs; a "Shell run successfully" message indicates success.
Step 4: Query Test
Open the Attu UI, navigate to the target collection, and verify that the synchronized data appears in the Data tab.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
