Backend Development 7 min read

How to Crawl Bilibili Video Danmaku Data Using Python

This tutorial explains how to locate Bilibili video danmaku (bullet‑comment) APIs, extract the CID, and use Python libraries such as requests, BeautifulSoup, and pandas to download, clean, and save the comment data to CSV files, with an optional API‑based shortcut.

Python Programming Learning Circle

Sep 15, 2021

How to Crawl Bilibili Video Danmaku Data Using Python

The article introduces two methods for obtaining Bilibili danmaku data: directly accessing the XML or .so API endpoints using the video’s CID, and using a third‑party Python API library. It walks through finding the CID via the browser’s Network panel, constructing the request URL, and retrieving the XML file.

After acquiring the XML, the guide shows a complete Python script that fetches the file, parses all d tags with BeautifulSoup, removes extra whitespace using regular expressions, and writes the cleaned comments to a CSV file with pandas. The script also prints the number of comments retrieved.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

# 弹幕保存文件
file_name = '刺客伍六七第一集.csv'

# 获取页面
cid = 47506569
url = "https://comment.bilibili.com/" + str(cid) + ".xml"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}
request = requests.get(url=url, headers=headers)
request.encoding = 'utf-8'

# 提取弹幕
soup = BeautifulSoup(request.text, 'lxml')
results = soup.find_all('d')

# 数据处理
data = [data.text for data in results]
# 正则去掉多余的空格和换行
for i in data:
    i = re.sub('\s+', '', i)

print("弹幕数量为：{}".format(len(data)))

# 输出到文件
df = pd.DataFrame(data)
df.to_csv(file_name, index=False, header=None, encoding="utf_8_sig")
print("写入文件成功")

For a simpler approach, the article suggests installing the bilibili_api package and using its VideoInfo class to fetch danmaku directly by providing the BV ID. The same cleaning and CSV export steps are demonstrated.

pip install bilibili_api

from bilibili_api import video
import re
import pandas as pd

BVid = "BV1oW41157Na"
file_name = '刺客伍六七第一集.csv'

my_video = video.VideoInfo(bvid=BVid)
danmu = my_video.get_danmaku()

data = [d.text for d in danmu]
for i in data:
    i = re.sub('\s+', '', i)

print("弹幕数量为：{}".format(len(data)))

df = pd.DataFrame(data)
df.to_csv(file_name, index=False, header=None, encoding="utf_8_sig")
print("写入文件成功")

The guide also notes Bilibili’s danmaku pool limits based on video length and explains how older comments are discarded when the limit is exceeded, ensuring the latest comments are always displayed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python API danmaku Bilibili

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.