Artificial Intelligence 4 min read

How Bilibili Uses Large Language Models to Build an Intelligent Assistant

This article explains Bilibili's large‑language‑model‑based intelligent assistant, describing the platform's five‑layer architecture, massive daily task load, common failure and slowdown causes, and the need for AI‑driven troubleshooting to improve reliability and performance.

DataFunTalk

Nov 7, 2025

How Bilibili Uses Large Language Models to Build an Intelligent Assistant

Introduction

This article shares Bilibili's practice of building an intelligent agent assistant based on large language models.

Background

Bilibili is a video sharing platform with massive data. Its big‑data platform supports many business lines such as AI and commerce.

1. Overall Architecture and Scale

The platform follows a “five‑layer integrated” plus “separate storage and compute” design. The bottom layer is a distributed file system; the middle layer provides intelligent scheduling; compute engines include Spark, Flink, etc.; clients, real‑time streams (Kafka), OLAP engine (ClickHouse) and custom tools and CI/CD platforms complete the stack.

Daily workload is huge: 270,000 offline tasks, about 20,000 ad‑hoc queries, and roughly 7,000 critical real‑time jobs. The support team receives thousands of inquiries weekly, each small team handling about three person‑days of tickets, requiring dedicated staff to answer task‑failure or slowdown questions.

2. Users' Problems

For offline computation, users mainly ask why tasks fail and why they become slow.

Why tasks fail

Kernel defects, especially after untested kernel upgrades.

Issues in dependent components; upgrades or bugs in shared resources cause failures.

Data quality problems; corrupted or invalid input data leads to failure.

Other reasons such as memory errors.

Why tasks become slow

Hardware aging; large storage fleets experience degraded read/write speed over time.

Resource scheduling pressure; massive user volume and mixed‑deployment mechanisms cause contention.

Data skew or problematic data distribution.

Because diagnosing these causes is time‑consuming, Bilibili explores intelligent methods to assist in troubleshooting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Bilibili big data platform Intelligent Assistant task failure analysis

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.