Big Data 24 min read

Design and Implementation of Bilibili's Offline Multi‑Datacenter Solution

This article describes Bilibili's offline multi‑datacenter architecture, explaining why a scale‑out approach was chosen over scale‑up, and detailing the unit‑based design, job placement, data replication, routing, versioning, bandwidth throttling, traffic analysis, and the operational results and future directions.

DataFunTalk

Sep 4, 2022

Design and Implementation of Bilibili's Offline Multi‑Datacenter Solution

Bilibili's rapid business growth caused offline cluster capacity to approach data‑center limits, prompting the need for a solution beyond simply moving clusters to larger facilities (scale‑up).

Two main options were evaluated: (1) scale‑up, which requires full‑cluster migration without downtime and carries high economic cost and future capacity risk; (2) scale‑out, which adds multiple data‑centers while keeping the user view of a single cluster, but introduces network bandwidth and stability challenges. Scale‑out was selected as the preferred strategy.

The design adopts a unit‑based architecture where each data‑center acts as an independent unit containing all services and data needed for job execution, thereby isolating failures and reducing cross‑datacenter traffic. Job placement is driven by dependency analysis using DAG community detection to group tightly coupled jobs into the same unit, and a DataManager service stores placement and replication metadata.

Data replication is built on an enhanced DistCp tool with atomicity, idempotence, flow‑control, multi‑tenant prioritization, and lifecycle management. Rules extracted from job histories guide automatic replication of hot Hive tables, while snapshot‑based copying handles initial bulk data.

Data routing leverages an HDFS Router with multi‑mount points; client IP determines the nearest data‑center replica, and a special IDC_FOLLOW mount handles temporary tables with dynamic paths.

To maintain consistency across replicas, a version service monitors HDFS edit logs and updates replica versions, allowing jobs to use ready replicas or block until consistency is achieved.

Because cross‑datacenter bandwidth is limited (~4 Tbps) and shared with latency‑sensitive services, a token‑bucket based throttling service (ThrottleService) enforces read/write limits, prioritizes high‑importance jobs, and provides graceful degradation on failures.

Cross‑datacenter traffic analysis collects per‑job, per‑IP flow logs via instrumented DataNode and engine components, aggregates them in ClickHouse, and visualizes top‑traffic jobs to guide placement optimization and ad‑hoc traffic mitigation.

After six months of production, the solution has migrated ~300 PB of data, handling one‑third of offline jobs while alleviating bandwidth and stability issues; future work includes further unit‑based automation and community‑detection‑driven migration planning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Replication HDFS bandwidth optimization multi-datacenter Job Scheduling

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.