Big Data 13 min read

Unlock Seamless BigQuery to MaxCompute Migration with dbt‑maxcompute

This article details the real‑world migration of Southeast Asian tech leader GoTerra from BigQuery to MaxCompute, showcasing how the open‑source dbt‑maxcompute adapter enables smooth ELT transitions, advanced incremental strategies, performance gains, ecosystem compatibility, and comprehensive best‑practice implementations for large‑scale data pipelines.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Unlock Seamless BigQuery to MaxCompute Migration with dbt‑maxcompute

GoTerra, a leading Southeast Asian tech group, processes petabyte‑scale data daily. Its original data modeling relied on BigQuery + dbt.

Background and Challenge

To preserve the agile development model after moving to MaxCompute, the open‑source dbt‑maxcompute adapter was created.

dbt Philosophy: ELT Replaces ETL

Traditional ETL separates transformation logic from storage and creates performance bottlenecks. dbt promotes an ELT approach: load raw data into the warehouse and perform transformations using the warehouse’s compute power, offering simpler architecture, performance, testability, and documentation.

Incremental Strategies in dbt‑maxcompute

dbt‑maxcompute fully supports all incremental strategies available in dbt‑bigquery, providing flexible, high‑performance incremental processing.

Merge Strategy (default)

Implemented via a single atomic MERGE INTO statement, ideal for SCD Type 1 tables and deduplication.

MERGE INTO target_table AS DBT_INTERNAL_DEST
USING temp_table AS DBT_INTERNAL_SOURCE
ON (DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id)
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...;

Insert Overwrite Strategy

Uses INSERT OVERWRITE to efficiently replace partition data, suitable for large partitioned fact tables.

INSERT OVERWRITE TABLE target_table PARTITION(date_col)
SELECT * FROM temp_table WHERE date_col IN ('...');

Other Strategies

Delete + Insert – classic fallback for non‑transactional tables.

Append – high‑performance append‑only mode for immutable event logs.

Core Practice 1: Flexible Incremental Strategy

Addresses diverse incremental needs, partition table optimization, performance‑cost trade‑offs, and ecosystem compatibility.

Core Practice 2: Enhanced Table Materialization

Leverages MaxCompute native table types such as Append Delta Table, Auto Partition Table, and Transactional Table. Configuration is expressed via the config macro:

{{ config(
    materialized='table',
    partition_by={'fields':'dt','data_types':'timestamp'},
    tblproperties={'append2.enable':'true'},
    lifecycle=90
) }}
SELECT ...

Core Practice 3: Optimized Seed Loading

Seed loading uses MaxCompute Tunnel for bulk upload and automatic type inference, achieving several‑fold speedup over row‑by‑row INSERT.

version: 2
sources:
  - name: jaffle_shop
    database: raw
    tables:
      - name: orders
        freshness:
          warn_after: {count: 6, period: hour}
          error_after: {count: 12, period: hour}

Core Practice 4: Data Freshness Monitoring

dbt‑maxcompute implements source freshness by reading the last_data_modified_time metadata, providing near‑zero‑cost freshness checks.

Core Practice 5: Third‑Party dbt Package Adaptation

Key macros from popular packages (dbt‑utils, dbt‑date, dbt‑expectations, dbt‑codegen) are rewritten to use MaxCompute‑compatible functions such as listagg, datetrunc, and datediff, allowing existing projects to migrate without code changes.

Summary and Future Outlook

dbt‑maxcompute has proven its ability to enable seamless migration, improve performance, and maintain ecosystem compatibility in PB‑scale workloads. Future work includes GA release, richer MaxCompute feature support, and open‑source community collaboration.

Migration architecture diagram
Migration architecture diagram
Incremental strategy comparison
Incremental strategy comparison
Insert Overwrite performance
Insert Overwrite performance
Transactional vs non‑transactional tables
Transactional vs non‑transactional tables
Seed loading performance chart
Seed loading performance chart
data migrationbig dataMaxComputedbtELTIncremental Strategy
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.