How Bilibili Leverages Large Language Models to Automate Big Data Operations
This article explores Bilibili’s implementation of a large‑language‑model‑driven intelligent assistant that helps troubleshoot massive offline and real‑time data processing tasks, detailing the platform’s five‑layer architecture, common failure causes, and how AI can streamline issue resolution.
Introduction
This article shares Bilibili’s practice of building an intelligent assistant based on large language models.
Background
Bilibili is a video sharing platform with massive data. Its big‑data platform supports many business lines, including AI and commerce.
The platform follows a “five‑layer integrated” plus “separate storage and compute” architecture: a distributed file system at the bottom, an intelligent scheduling layer, various compute engines such as Spark and Flink, client tools, real‑time streams (Kafka), OLAP engine (ClickHouse), and custom CI/CD tools.
Daily workload includes 270,000 offline tasks, about 20,000 ad‑hoc queries, and roughly 7,000 critical real‑time jobs. The support team handles thousands of inquiries weekly, with each sub‑team spending about three person‑days per week on troubleshooting task failures or slowdowns.
User Issues
Users mainly ask two questions about offline jobs: why a task fails and why it becomes slow.
Why tasks fail
Kernel defects, especially after untested upgrades.
Problems in dependent components; bugs or upgrades in shared resources can cause failures.
Data quality issues.
Other reasons such as memory problems.
Why tasks become slow
Hardware aging, e.g., disk wear affecting read/write speed.
Resource scheduling pressure and cross‑department resource shuffling.
Data skew or inherent data problems.
Because diagnosing these causes is time‑consuming, an intelligent assistant is needed.
Nature of User Queries
Queries are typically terse, often just a problem description with a link or a screenshot.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
