How MaxCompute Extends SQL to Match BigQuery – New Features & Compatibility
This article details the challenges of migrating 100,000+ SQL statements from BigQuery to MaxCompute for a leading Southeast Asian tech group, explains the new MaxCompute SQL syntax, auto‑partition and ingestion‑time tables, enhanced built‑in functions, and the BigQuery compatibility mode that ensures seamless query behavior.
Solution Overview
To enable a smooth migration from BigQuery to MaxCompute, the team first analyzed the syntax differences between the two platforms and designed enhancements to MaxCompute that replicate BigQuery’s behavior.
Key Enhancements in MaxCompute SQL
Auto partition tables
UNNEST syntax
Delta table capabilities (multiple UPDATE/INSERT in MERGE, DELETE with alias)
30+ new built‑in functions
Auto Partition Table
MaxCompute introduces
AUTO PARTITIONED BY (trunc_time(<col>, <datePart>) [as alias])to emulate BigQuery’s time‑unit column partitioning.
CREATE TABLE newtable (id INT64, d DATE) AUTO PARTITIONED BY (trunc_time(d, 'month') as ds);The generated table contains a pseudo‑column ds that stores the truncated date value.
Ingestion‑Time Partition Table
By adding a table property 'ingestion_time_partition'='true', MaxCompute can create tables that automatically partition by ingestion time.
-- hourly ingestion partition
create table foo_ingestion_hourly(_partitiontime timestamp_ntz, a string)
auto partitioned by (trunc_time(_partitiontime, 'hour'))
tblproperties('ingestion_time_partition'='true');
-- daily ingestion partition
create table foo_ingestion_daily(_partitiontime timestamp_ntz, a bigint)
auto partitioned by (trunc_time(_partitiontime, 'day'))
tblproperties('ingestion_time_partition'='true');Partition Pruning
Auto‑partitioned tables support partition pruning through various query patterns, such as filtering on the pseudo‑column, the original timestamp column, or using trunc_time / datetrunc functions that match the table’s partition granularity.
-- filter by pseudo column
select * from table_daily where pt >= '2024-09-14';
-- filter by original timestamp range
select * from table_daily where ts between timestamp '2024-09-14 00:00:00' and timestamp '2024-09-15 00:00:00';
-- filter using trunc_time
select * from table_daily where trunc_time(ts, 'day') = '2024-09-14';
-- filter using datetrunc
select * from table_daily where datetrunc(ts, 'day') = timestamp '2024-09-14 00:00:00';Built‑in Function Enhancements
More than 30 new functions were added and many existing ones were extended to support time zones, additional formats, and richer semantics. Highlights include:
New time functions: TO_TIME, TO_TIMESTAMP, TIME_ADD, TIME_SUB, TIME_DIFF, TIME_TRUNC, FORMAT_TIME, CURRENT_DATE, CURRENT_TIMESTAMP_NTZ, CURRENT_MICROS.
Extended DATETRUNC, DATEDIFF, DATEADD, LAST_DAY, WEEKOFYEAR with quarter, week, isoweek, etc.
New IP‑related functions: NET_IP_NET_MASK, NET_IP_FROM_STRING, NET_SAFE_IP_FROM_STRING, NET_IP_TO_STRING, NET_IPV4_TO_INT64, NET_HOST, NET_PUBLIC_SUFFIX, NET_REG_DOMAIN.
String/binary utilities: BASE32, CODEPOINT_ARRAY, SAFE_CONVERT_BYTES_TO_STRING, FORMAT_STRING, enhanced REGEXP_EXTRACT, REVERSE for binary.
Regex functions: REGEXP_CONTAINS, enhanced REGEXP_EXTRACT / REGEXP_EXTRACT_ALL.
JSON functions: JSON_STRIP_NULLS, enhanced JSON_EXTRACT, TO_JSON null‑handling.
Aggregations: HLL++ family ( HLL_COUNT_INIT, HLL_COUNT_MERGE, …), PERCENTILE_CONT, PERCENTILE_DISC, STRING_AGG, ARRAY_AGG, APPROX_QUANTILES.
Geospatial: ST_S2CELLIDFROMPOINT, ST_S2CELLIDNUMFROMPOINT, ST_ARRAY (array input).
These additions ensure that most BigQuery functions have a one‑to‑one mapping on MaxCompute.
BigQuery Compatibility Mode
Setting the parameter odps.sql.bigquery.compatible=true makes MaxCompute behave like BigQuery for several language features:
Alias resolution in GROUP BY – ambiguous column names are resolved to the alias defined in the SELECT clause.
CTE output with duplicate column names is allowed.
Implicit type conversion (e.g., string + interval) follows BigQuery rules.
-- Enable compatibility
set odps.sql.bigquery.compatible=true;
-- Example: alias resolution
with t1 as (select 1 a, 2 b), t2 as (select 1 a, 2 b)
select t1.a as a from t1 join t2 on t1.a=t2.a group by a;With the compatibility mode enabled, the above queries run successfully on MaxCompute, producing the same results as in BigQuery.
Business Value
After applying the syntax enhancements and using the conversion tools, the GoTerra team successfully rewrote all 100,000+ SQL statements, enabling a seamless migration from GCP to MaxCompute. The migration preserved query correctness, improved performance, and delivered a stable, high‑throughput data platform for the customer.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
