Big Data 11 min read

Tencent SuperSQL: Architecture, Adaptive Compute Engine, Real‑time Lakehouse Fusion and Future Outlook

The article introduces Tencent's self‑developed SuperSQL big‑data platform, detailing its four‑layer architecture, adaptive compute engine with multi‑engine selection, real‑time lake‑warehouse integration, performance optimizations and future directions for intelligent, cloud‑native analytics.

DataFunTalk
DataFunTalk
DataFunTalk
Tencent SuperSQL: Architecture, Adaptive Compute Engine, Real‑time Lakehouse Fusion and Future Outlook

SuperSQL is Tencent's next‑generation big‑data adaptive compute platform that provides a unified, cloud‑agnostic experience across public cloud, private cloud and on‑premise environments.

The platform consists of four independent layers:

Core Engine Layer : unified entry point, intelligent SQL optimization, materialized view construction, and engine selection based on metadata, history and cluster status.

Compute Layer : selects the best engine per SQL (Spark for ETL, Presto for interactive queries, Hermes for log analytics, StarRocks for data queries, PowerFL for security scenarios) and offers a Remote Shuffle Service for data shuffling.

Resource Layer : aggregates cloud and on‑premise resources into a unified pool with elastic scheduling and resource borrowing.

Data Orchestration Layer : abstracts heterogeneous storage, decouples compute and storage, and provides adaptive caching and self‑learning data access.

The adaptive compute engine supports plugin‑based SQL parsing for multi‑engine compatibility and employs a multi‑stage engine selection framework (RBO, CBO, HBO, AI‑based prediction) to choose the optimal engine for each query.

SuperSQL also delivers a real‑time lakehouse solution that integrates streaming and batch workloads. Data can be ingested via MQ or Flink into a real‑time warehouse (StarRocks) and later cooled into a data lake (Iceberg/Hudi). Fusion queries combine hot data from the warehouse with cold data from the lake, with metadata mapping enabling seamless cross‑source access.

Performance optimizations include caching Iceberg source files in Alluxio and a benchmark using TPC‑H that shows up to 65× speedup for StarRocks internal tables and a 3× improvement for fused queries.

Future work will focus on enhancing SuperSQL's self‑adaptive capabilities, expanding lakehouse features to match native Iceberg performance, improving data‑lake query efficiency, and adding advanced indexing to lake formats.

big datareal-time analyticsTencentLakehouseSuperSQLadaptive compute
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.