Big Data 8 min read

Sync MaxCompute Tables to Milvus with DataWorks: Step‑by‑Step Guide

This guide explains how to use Alibaba Cloud DataWorks to create the necessary resources, configure Milvus and MaxCompute data sources, set up an offline single‑table synchronization task, and verify the imported vectors, enabling efficient AI‑driven vector search on large structured datasets.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sync MaxCompute Tables to Milvus with DataWorks: Step‑by‑Step Guide

01 Background Introduction

In modern big‑data and AI scenarios, enterprises often need to vectorize large‑scale structured data stored in cloud data warehouses such as MaxCompute to support efficient vector retrieval and similarity analysis. Alibaba Cloud Milvus‑based vector search service provides a fully managed, compatible engine that scales for massive AI vector data. Integrating MaxCompute with Milvus enables use cases like e‑commerce behavior analysis, medical knowledge bases, and game content recommendation.

02 Prerequisites

Create a Milvus instance – see the provided link.

Create a MaxCompute project – see the provided link.

Prepare DataWorks environment: create a workspace, bind the required DataWorks resource group, and ensure exclusive data‑integration resources are available.

03 Operation Process

Step 1: Data Preparation

Prepare test data in MaxCompute. Example table creation and data insertion:

CREATE TABLE dl_1216.`default`.mc_table (
    id INT,
    namespace STRING,
    vector ARRAY<DOUBLE>
);
INSERT INTO dl_1216.`default`.mc_table VALUES(100, 'aaa', array(1554047123.0, 1554047123.0));
INSERT INTO dl_1216.`default`.mc_table VALUES(200, 'bbb', array(1554047999.0, 1554047999.0));
SELECT * FROM dl_1216.`default`.mc_table;

Step 2: Add Data Sources

Create Milvus data source – select Milvus as the source type and configure basic information.

Create MaxCompute data source – select MaxCompute as the source type and configure basic information.

Test connectivity for each data source; a "connected" status indicates successful creation.

Step 3: Configure Synchronization Task

Select Synchronization Task in the left navigation.

Click New Sync Task , choose Single Table Offline as the sync type, and confirm.

In the task wizard, set source (MaxCompute) and destination (Milvus) data sources, configure field mappings (source fields must match target fields), and adjust channel parameters.

After configuration, run the task and monitor logs; a "Shell run successfully" message indicates success.

Step 4: Query Test

Open the Attu UI, navigate to the target collection, and verify that the synchronized data appears in the Data tab.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataMilvusvector searchMaxComputeData IntegrationDataWorks
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.