Big Data 19 min read

How MaxCompute’s Append DeltaTable Transforms BigQuery Migration

This article details the complex migration of a leading Southeast Asian tech group's data warehouse from Google BigQuery to Alibaba Cloud MaxCompute, outlining challenges such as storage format differences, SQL compatibility, and performance tuning, and explains how the new Append DeltaTable format with dynamic bucketing and incremental reclustering resolves these issues.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How MaxCompute’s Append DeltaTable Transforms BigQuery Migration

Background

When the Southeast Asian leading tech group (referred to as GoTerra) decided to migrate its enterprise data warehouse from Google BigQuery to Alibaba Cloud MaxCompute, the decision was driven by regional compliance requirements, cost‑optimised deployment in the Asia‑Pacific market, and the need for petabyte‑scale data processing capabilities.

Why BigQuery?

BigQuery is a globally leading cloud data‑warehouse product that offers a serverless architecture, elastic scaling and high‑concurrency performance. Its core advantages include fully managed services, standard SQL support, low‑latency queries and a pay‑as‑you‑go model.

Key Challenges of the Migration

Underlying storage format differences : BigQuery and MaxCompute use fundamentally different storage architectures, requiring extensive redesign and optimisation.

SQL compatibility : MaxCompute SQL differs from BigQuery’s standard SQL in syntax, functions and execution engine, necessitating automated conversion tools.

Data consistency : Preventing data loss, version conflicts and ETL interruptions during cross‑platform migration is critical.

Performance tuning : MaxCompute’s partitioned tables and resource‑group scheduling must be adapted to existing workloads.

Organisational coordination : Balancing system availability and gray‑release strategies across multinational teams.

Storage Technical Solution – Append DeltaTable

By analysing MaxCompute’s existing storage capabilities and future format roadmap, a new Append DeltaTable format was introduced, delivering:

Unified table structure that supports dynamic clustering, ACID transactions, data appends, streaming writes, time‑travel and incremental reads.

On‑demand adjustment of data organisation and functionality to match evolving use cases.

Compatibility with existing data‑access paths, reducing migration effort.

Maintained or improved cost‑performance characteristics.

The format enables a single table type to combine the advantages of Standard, Range/Hash Cluster, Transactional and Delta tables, simplifying user learning and operational overhead.

Storage Service – Autonomous Data Governance

Storage Service is MaxCompute’s core distributed storage engine that provides high‑reliability, high‑throughput storage while supporting autonomous data‑governance tasks such as file merging, tiered storage, minor/major compaction, index building, streaming compaction, data reclustering and cross‑region backup.

Dynamic Bucketing

Traditional static bucket configuration requires users to estimate data volume per table, which is impractical for thousands of tables and for rapidly changing workloads. Append DeltaTable introduces automatic bucket allocation, creating ~500 MB logical buckets on demand, eliminating the need for manual bucket sizing and avoiding data skew or fragmentation.

Incremental Reclustering

Instead of requiring a full overwrite for clustering, Incremental Reclustering processes newly written buckets asynchronously, maintaining query performance while supporting millisecond‑level data freshness on ODS tables.

Performance Impact

Data autonomy : Merge, compaction and reclustering tasks balance storage efficiency and query speed.

Elastic scaling : Dynamic bucketing and auto‑split/merge handle data from terabytes to exabytes.

Real‑time clustering : Incremental reclustering delivers sub‑second query acceleration on fresh data.

Practice Summary

Append DeltaTable eliminates functional fragmentation in MaxCompute, lowers the learning curve, and enhances flexibility, timeliness and scenario coverage. In the GoTerra migration, it handled over 550 k tables and 60 PB of data, matching the capabilities of leading international vendors.

Future Technical Planning

The format aligns with the Data + AI fusion architecture, providing columnar storage and vectorised engines for machine‑learning feature engineering, and supports multimodal data storage. Ongoing plans include deeper integration with MaxCompute’s real‑time compute components and the rollout of Delta Live MV, further unlocking the full lifecycle value of data assets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationBig DataData WarehouseStorage Optimization
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.