Big Data 14 min read

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

This article analyzes the shortcomings of mainstream Lambda‑style data warehouse architectures, introduces Hudi‑based lakehouse design principles, details the three‑layer unified storage architecture, data distribution, model and read/write mechanisms, and showcases real‑time streaming, multidimensional analysis, and stream‑batch reuse scenarios along with future roadmap plans.

Data Thinking Notes

Aug 27, 2023

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

Mainstream Data Warehouse Architecture (Lambda)

The Lambda architecture combines separate real‑time and batch pipelines to provide low‑latency and comprehensive data, merging their results for ad‑hoc queries. Its advantages are clear responsibility boundaries, high fault tolerance, and complexity isolation, but it suffers from data alignment issues, duplicated development effort, and doubled resource costs.

Clear responsibility boundaries: streaming handles incremental data, batch handles full data.

Fault tolerance: batch (T+1) can overwrite and correct streaming errors.

Complexity isolation: batch processes ready data, streaming deals with more complex incremental logic.

However, Lambda faces challenges in computation, operations, and cost, such as inconsistent results between streams and batches, duplicated code bases, and doubled storage/computation resources.

Data Lake Solution (Hudi)

Hudi provides a unified real‑time/offline storage layer with streaming source/sink capabilities, minute‑level data visibility, support for upserts/deletes, and seamless integration with Spark, Flink, and Presto.

Lakehouse Unified Storage Requirements

The unified storage must support high‑throughput batch reads/writes comparable to Hive, second‑level streaming latency with millions of RPS, exactly‑once and at‑least‑once semantics, and integration with multiple compute engines.

Overall Architecture

The architecture consists of three layers:

Storage Layer: Reuses Hudi’s columnar base files and row‑based log files, grouped by file‑group and versioned by timestamps, with indexing to boost ingestion and query performance.

Metadata Layer: Manages tables, partitions, instants, timelines, and snapshots, and handles multi‑node write conflict detection to guarantee ACID properties.

Service Layer: Includes BTS (memory‑accelerated service) for low‑latency reads/writes and TMS for asynchronous compaction and optimization.

Data Distribution

Table – a Hudi table.

Partition – a storage directory analogous to Hive partitions.

FileGroup – a group containing a base file and its log file; records with the same primary key reside in the same group.

Block – an in‑memory segment; for primary‑key tables it stores sorted data by timestamp, for non‑key tables it stores ordered offsets.

WAL Log – persistent storage for evicted blocks, enabling stream replay.

Task ↔ Block – a one‑to‑many relationship between compute tasks and blocks.

Data Model

Each lakehouse table provides two views:

Incremental View: Append‑only table capturing every change (ordered by CommitId+Offset; for keyed tables multiple rows per key may exist).

Snapshot View: Time‑dynamic snapshot table used for batch processing (unique key per partition, storing only the latest record for keyed tables).

The snapshot view can be derived from the incremental view, but not vice‑versa.

Data Read/Write

Load is separated between streaming (latency‑sensitive, accelerated by BTS) and batch (throughput‑oriented, directly accessing storage). Consistency is ensured by allowing concurrent writes without blocking each other while preserving low latency for streaming jobs.

BTS Architecture

BTS comprises two components:

BTS Master: Block Load Balancer (assigns blocks to clients), Block Metadata Manager (maintains block‑to‑server metadata), and Transaction Manager (creates and commits distributed transactions).

BTS Table Server: Session Manager (client session info), DataService (RPC read/write with column pruning and predicate push‑down), Transaction Manager (pre‑commit details), MemStore (shared in‑memory area for fast lookup, column pruning, filtering, sorting), and WAL (persistent log for recovery and read‑through after eviction).

Application Scenarios

The lakehouse shines in three classic use cases:

Streaming Data Computation: Replaces complex Kafka‑plus‑multiple‑store pipelines with a single Hudi table, reducing component dependencies, simplifying debugging, and enabling low‑cost historical replay.

Real‑Time Multidimensional Analysis: Eliminates the need for ClickHouse by storing all data in Hudi and serving queries via Presto, cutting expensive OLAP resources and supporting full‑history queries.

Stream‑Batch Data Reuse: Shares the same Hudi DWD layer between real‑time and offline warehouses, saving duplicate compute/storage and accelerating data readiness.

Future Planning

Roadmap focuses on three aspects:

Engine Performance: Multi‑task concurrent writes via WAL merging, asynchronous flush, and memory optimizations.

Stability: Better load‑balancing, multi‑region deployment, and cross‑region backup for disaster recovery.

Business Features: Support for Kafka‑like partitioning and consumer groups.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Streaming Lakehouse Hudi

Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.