Big Data 13 min read

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

DataFunSummit
DataFunSummit
DataFunSummit
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

1. System Overview

BiFang is a lake‑stream integrated storage engine that unifies message‑queue and data‑lake functionalities, supporting full‑incremental queries and end‑to‑end real‑time data visibility. It is built on Tencent Tianqiong Pulsar and integrates with Iceberg for lake storage.

1.1 System Positioning

BiFang provides a single entry for both streaming and batch data, compatible with mainstream batch‑stream engines and meeting diverse real‑time, consistency, and flexibility requirements.

1.2 Applicable Scenarios

Full‑incremental query of message‑queue data using Pulsar manifests.

Real‑time visibility of Iceberg lake data, reducing latency from minutes to sub‑second.

Unified storage for stream and batch, enabling cost and operational complexity reductions.

Real‑time multidimensional reporting via StarRocks integration.

Efficient low‑cost multi‑stream stitching with KV/Value support.

1.3 Industry Comparison

Compared with Alibaba Fluss and Douyin BTS, BiFang offers a unified storage engine that supports exactly‑once semantics, sub‑second data visibility, and has been deployed in production for video, gaming, and AI pipelines.

2. Architecture Principles

BiFang consists of three main components: BiFang Client, BiFang Server, and Lakehouse Storage (currently Iceberg). The server extends Pulsar Broker with modules such as Log Writer, Offload Service, Transaction Manager, Manifest Store, Manifest Service, and File Service.

2.1 Overall Architecture

The architecture integrates Pulsar and Iceberg, using a unified metadata catalog to manage both streaming and batch data.

2.2 Core Process

Data is written by Log Writer as row‑format batches, generating Delta Manifests stored in Manifest Store.

Manifest Service consumes Delta Manifests, creates BiFang logical files, and builds Manifest Files for Iceberg.

Auto Optimizer merges Manifest Files and converts logical files to columnar Parquet files.

Offload Service moves data to long‑term HDFS storage, enabling seamless reads from historical files.

2.3 Technical Advantages

Unified table management and metadata governance via Iceberg.

End‑to‑end real‑time data visibility through real‑time Manifest queries.

Hybrid row‑column storage reduces storage redundancy and improves query performance.

Exactly‑once semantics with Pulsar transactions and Read‑Committed isolation.

Broad engine compatibility (Flink, Spark, StarRocks) and ecosystem integration.

3. Business Practice

In Tencent Video, BiFang replaces the traditional Lambda architecture, collapsing message‑queue, Flink real‑time jobs, and Iceberg ingestion into a single step, achieving sub‑second data visibility, exactly‑once guarantees, and eliminating the need for separate reconciliation pipelines.

4. Future Roadmap

Architecture optimization for higher read/write performance and stability.

Enhanced core capabilities: unified lakehouse lifecycle, KV/Changelog support, Arrow columnar format.

Ecosystem enrichment: integration with InLong, StarRocks, Oceanus, and WeData governance platform.

Big Datastream processingReal-time Analyticsdata storageApache PulsarApache IcebergLakehouse
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.