How MaxCompute’s Append DeltaTable Transforms BigQuery Migration
This article details the complex migration of a leading Southeast Asian tech group's data warehouse from Google BigQuery to Alibaba Cloud MaxCompute, outlining challenges such as storage format differences, SQL compatibility, and performance tuning, and explains how the new Append DeltaTable format with dynamic bucketing and incremental reclustering resolves these issues.
Background
When the Southeast Asian leading tech group (referred to as GoTerra) decided to migrate its enterprise data warehouse from Google BigQuery to Alibaba Cloud MaxCompute, the decision was driven by regional compliance requirements, cost‑optimised deployment in the Asia‑Pacific market, and the need for petabyte‑scale data processing capabilities.
Why BigQuery?
BigQuery is a globally leading cloud data‑warehouse product that offers a serverless architecture, elastic scaling and high‑concurrency performance. Its core advantages include fully managed services, standard SQL support, low‑latency queries and a pay‑as‑you‑go model.
Key Challenges of the Migration
Underlying storage format differences : BigQuery and MaxCompute use fundamentally different storage architectures, requiring extensive redesign and optimisation.
SQL compatibility : MaxCompute SQL differs from BigQuery’s standard SQL in syntax, functions and execution engine, necessitating automated conversion tools.
Data consistency : Preventing data loss, version conflicts and ETL interruptions during cross‑platform migration is critical.
Performance tuning : MaxCompute’s partitioned tables and resource‑group scheduling must be adapted to existing workloads.
Organisational coordination : Balancing system availability and gray‑release strategies across multinational teams.
Storage Technical Solution – Append DeltaTable
By analysing MaxCompute’s existing storage capabilities and future format roadmap, a new Append DeltaTable format was introduced, delivering:
Unified table structure that supports dynamic clustering, ACID transactions, data appends, streaming writes, time‑travel and incremental reads.
On‑demand adjustment of data organisation and functionality to match evolving use cases.
Compatibility with existing data‑access paths, reducing migration effort.
Maintained or improved cost‑performance characteristics.
The format enables a single table type to combine the advantages of Standard, Range/Hash Cluster, Transactional and Delta tables, simplifying user learning and operational overhead.
Storage Service – Autonomous Data Governance
Storage Service is MaxCompute’s core distributed storage engine that provides high‑reliability, high‑throughput storage while supporting autonomous data‑governance tasks such as file merging, tiered storage, minor/major compaction, index building, streaming compaction, data reclustering and cross‑region backup.
Dynamic Bucketing
Traditional static bucket configuration requires users to estimate data volume per table, which is impractical for thousands of tables and for rapidly changing workloads. Append DeltaTable introduces automatic bucket allocation, creating ~500 MB logical buckets on demand, eliminating the need for manual bucket sizing and avoiding data skew or fragmentation.
Incremental Reclustering
Instead of requiring a full overwrite for clustering, Incremental Reclustering processes newly written buckets asynchronously, maintaining query performance while supporting millisecond‑level data freshness on ODS tables.
Performance Impact
Data autonomy : Merge, compaction and reclustering tasks balance storage efficiency and query speed.
Elastic scaling : Dynamic bucketing and auto‑split/merge handle data from terabytes to exabytes.
Real‑time clustering : Incremental reclustering delivers sub‑second query acceleration on fresh data.
Practice Summary
Append DeltaTable eliminates functional fragmentation in MaxCompute, lowers the learning curve, and enhances flexibility, timeliness and scenario coverage. In the GoTerra migration, it handled over 550 k tables and 60 PB of data, matching the capabilities of leading international vendors.
Future Technical Planning
The format aligns with the Data + AI fusion architecture, providing columnar storage and vectorised engines for machine‑learning feature engineering, and supports multimodal data storage. Ongoing plans include deeper integration with MaxCompute’s real‑time compute components and the rollout of Delta Live MV, further unlocking the full lifecycle value of data assets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
