How to Scrape Bilibili Danmu and Create Word Clouds with Python
This article walks through using Python to crawl Bilibili danmu (bullet comments), process the text with jieba, and generate a visual word cloud, providing complete code examples and tips for adapting the script to other videos.
Introduction
Hello, I'm Pipi. In a Python community a user asked about web crawling and visualization of Bilibili danmu (bullet comments). The original code is shown below.
# import modules
import requests
import re
import jieba
import wordcloud
import imageio
# select mask image
mask = imageio.imread('xing.jpg')
# target URL
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=177974677'
# request headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
def get_damu(url):
response = requests.get(url, headers)
response = response.content.decode('utf-8')
data = re.compile('<d.*?>(.*?)</d>')
danmu = data.findall(response)
danmu_word = jieba.lcut(" ".join(danmu))
danmu_str = " ".join(danmu_word)
w = wordcloud.WordCloud(font_path="msyh.ttc", background_color='white', width=1000, height=500, mask=mask)
w.generate(danmu_str)
w.to_file('danmu.png')
if __name__ == '__main__':
s = input("输入要爬取的弹幕地址:")
get_damu(s.strip())The code works but the URL needs to be changed for other videos, which can be hard for beginners.
Implementation
A community member provided a modified version that uses a different mask image and hard‑codes the target URL.
# import modules
import requests
import re
import jieba
import wordcloud
import imageio
mask = imageio.imread('Python进阶者.jpg')
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
def get_damu(url):
response = requests.get(url, headers)
response = response.content.decode('utf-8')
data = re.compile('<d.*?>(.*?)</d>')
danmu = data.findall(response)
danmu_word = jieba.lcut(" ".join(danmu))
danmu_str = " ".join(danmu_word)
w = wordcloud.WordCloud(font_path="simkai.ttf", background_color='white', width=1000, height=500, mask=mask)
w.generate(danmu_str)
w.to_file('danmu.png')
if __name__ == '__main__':
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=177974677'
get_damu(url)Running this script generates the expected word‑cloud image.
To use a different video, replace the URL with the appropriate Bilibili danmu endpoint, which can be found in the XHR requests of the video page.
Conclusion
This article demonstrates how to fetch Bilibili danmu with Python, process the text using jieba, and visualise it as a word cloud. The provided scripts can be adapted to other videos by changing the URL.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
