Bigo Real‑Time Computing Platform: Architecture, Features, and Performance Improvements
This article presents the evolution, architecture, and key innovations of Bigo's real‑time computing platform—covering its migration from Spark Streaming to Flink, unified platform design, development tools, operational enhancements, and the efficiency gains achieved in business scenarios such as ETL and AB‑testing.
Introduction
Guest speaker Xu Shuai from Bigo shares the construction practice of Bigo's real‑time computing platform, edited by the Flink Chinese community.
Overview
The presentation introduces the development history, distinctive features, business scenarios, efficiency improvements, and future outlook of Bigo's real‑time computing platform.
Development History
Before 2018, Bigo used Spark Streaming for a few real‑time jobs. From 2018 to 2019, with the rise of Flink, each business line independently deployed Flink clusters. Starting in 2019, all Flink‑based workloads were unified onto the Bigo real‑time computing platform, and after two years of construction, every real‑time scenario now runs on this platform.
Current Architecture
Bigo's three main apps—Live, Likee, and Imo—generate user behavior logs that serve as data sources. Some user information resides in MySQL. All data passes through message queues (primarily Kafka, gradually adopting Pulsar) and MySQL logs are ingested via BDP. The platform leverages the Hadoop ecosystem for dynamic resource management, while the compute layer is standardized on Flink. Users develop, debug, and monitor jobs through the BigoFlow management console. Processed data is stored in Hive, ClickHouse, HBase, and other sinks.
Features and Improvements
Powerful SQL editor.
Graphical topology adjustment and configuration.
One‑click multi‑cluster deployment.
Unified version management to reduce divergence.
Robust savepoint management.
Automatic log collection to Elasticsearch with built‑in error‑analysis rules.
Task history preservation for comparison and troubleshooting.
Automatic monitoring rule addition and resource‑recommendation engine.
Metadata is fully integrated across Kafka, Hive, and ClickHouse, allowing DDL‑free access to streams and tables.
ETL Scenario
Fully automated point collection.
No code required for users.
Data lands in Hive.
Metadata updates automatically.
AB‑Testing Scenario
Traditional offline AB‑testing required a day‑long delay. By moving the workflow to Flink, results can be produced in the morning. Initial challenges with large state (OOM) were addressed by splitting the job into several smaller Flink jobs, joining intermediate results in HBase, and finally writing to ClickHouse. A later iteration removed the HBase dependency, enabling a single Flink pipeline that writes directly to ClickHouse after a daily window.
Platform Optimizations
SSD support for Flink state storage.
Streaming reads from Hive with EventTime support.
Hive dimension‑table joins with partition‑load capability.
Enhanced ClickHouse sink.
These optimizations eliminated hourly task delays and moved daily job completion from the afternoon to before the workday, greatly accelerating iteration cycles.
Summary and Outlook
The platform is tightly coupled with business, integrates seamlessly with the company’s data ecosystem, and provides a nascent real‑time data warehouse. Future work includes expanding to more scenarios such as real‑time machine learning, advertising, risk control, and dashboards, as well as advancing Flink capabilities (large Hive joins, automated resource allocation, cgroup isolation).
Recruitment & Community
BIGO's big‑data team is hiring OLAP engineers (C++/Java). Interested candidates can email [email protected]. The DataFunTalk community promotes big‑data and AI knowledge sharing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
