How Bilibili Leverages Large Language Models to Solve Big Data Platform Failures
This article explains Bilibili's massive video platform data architecture, the huge daily workload of offline and real‑time tasks, common user problems like task failures and slowdowns, their root causes, and how a large language model assistant is being used to automate troubleshooting.
Background Introduction
Bilibili is a video sharing platform with massive data. Its big data platform supports many services, including AI and commerce.
1. Overall Architecture and Scale
The platform follows a “five‑layer integrated” plus “storage‑compute separation” architecture. The bottom layer is a distributed file system; the middle includes an intelligent scheduling layer and various compute engines such as Spark and Flink, as well as clients, real‑time streams (Kafka), an OLAP engine (ClickHouse), and custom tools and CI/CD pipelines.
It processes roughly 270,000 offline tasks daily, about 20,000 ad‑hoc queries, and 7,000 critical real‑time jobs. The support team receives thousands of tickets weekly, with each sub‑team handling about three person‑days of inquiries.
2. User Issues
For offline computation, users mainly ask why tasks fail or become slow.
Why tasks fail
Kernel defects, especially after untested upgrades.
Dependency component bugs; many tasks depend on shared resources that may break after upgrades.
Data quality problems.
Other reasons such as memory issues.
Why tasks become slow
Hardware aging; large storage volumes degrade read/write speed over time.
Resource scheduling pressure, especially with mixed deployment across departments.
Data skew or inherent data problems.
Because diagnosing these issues is time‑consuming, Bilibili explores intelligent methods to assist troubleshooting.
Typical user queries are concise, often just a problem description with a link or screenshot.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
