Big Data 9 min read

How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

This article details how a leading Southeast Asian tech group migrated its real‑time write workloads from Google BigQuery to MaxCompute using MaxCompute Streaming Insert, covering architecture, core features, migration challenges, optimization strategies, business impact, and future enhancements.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

This article is part of a series that explores the real‑time data migration journey of a major Southeast Asian technology group, focusing on the use of MaxCompute Streaming Insert to replace BigQuery streaming writes.

Overall Architecture Overview

MaxCompute Streaming Insert is a solution for large‑scale real‑time data ingestion, offering high throughput and low latency. Its design emphasizes stability, scalability, and ease of use, suitable for log collection, behavior tracking, IoT data uploads, and other streaming sources.

Core Features

Real‑time visibility of streamed data – unlike batch imports, newly inserted data can be read immediately by downstream tasks.

Scalable performance via client concurrency – a distributed client write model automatically adjusts concurrency based on traffic, ensuring stable performance under bursty or sustained loads.

Avoidance of storage fragmentation – writes use a row‑store file format that prevents the generation of numerous small files, reducing storage pressure.

Background compaction ensures query performance – periodic compaction operations reduce storage layer load and improve read efficiency, especially for continuously written streaming data.

Challenges During GoTerra Migration

The migration faced several complex challenges, including deep nested type support, automatic schema evolution detection, and stability/performance issues.

1. Multi‑level Nested Type Support

GoTerra extensively used nested data types (ARRAY, RECORD) in BigQuery. Early MaxCompute versions exhibited severe performance bottlenecks when handling deep nesting.

Root causes: low‑efficiency parsing/serialization of nested types and client SDK bottlenecks.

Optimizations: the storage team refactored and optimized nested‑type handling logic; the SDK team added performance‑optimized APIs for complex structures. After multiple iterations, MaxCompute supported up to 50 nesting levels, far exceeding BigQuery’s native 15‑level limit.

2. Automatic Schema Evolution

Frequent schema changes (adding fields, modifying types) required a mechanism to automatically detect and apply updates without downtime.

Implementation steps: the data channel service added listeners and broadcast capabilities for schema change events; the SDK integrated callback interfaces for schema change notifications; the SDK fetched real‑time schema state from request responses; the storage layer performed compatibility checks to ensure safe evolution.

Result: all ODS layer streaming jobs achieved seamless hot‑updates, dramatically reducing manual intervention and operational risk.

3. Stability and Performance Tuning

Initial writes to Append Table 2.0 showed high failure rates, latency spikes, and hotspot nodes due to load‑balancing issues.

Mitigation measures: enhanced client retry logic with exponential backoff and checkpointing; QoS‑based channel prioritization to protect core business data; refined compaction scheduling using time‑window and data‑size triggers; comprehensive monitoring metrics and alerting for real‑time write status.

Outcome: the system reached >99.9% per‑minute success rate and P99 request latency under 1 second, matching BigQuery performance.

Business Value and Future Outlook

MaxCompute Streaming Insert became the unified write entry for GoTerra’s ODS layer, simplifying architecture, supporting ~60 TB of daily real‑time data, and delivering high throughput, high availability, and automatic schema evolution, which lowered maintenance costs.

Future plans include providing exactly‑once semantics and dynamic partition write support to further reduce client development complexity.

Conclusion

With its advanced architecture, strong performance, and flexible extensibility, MaxCompute Streaming Insert enabled a smooth migration from BigQuery to MaxCompute, establishing a solid foundation for GoTerra’s future data‑lake‑warehouse integration and real‑time analytics.

big datareal-time dataMaxComputeschema evolutionBigQuery MigrationStreaming Insert
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.