How to Scrape Bilibili Danmu and Create Word Clouds with Python

This article walks through using Python to crawl Bilibili danmu (bullet comments), process the text with jieba, and generate a visual word cloud, providing complete code examples and tips for adapting the script to other videos.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape Bilibili Danmu and Create Word Clouds with Python

Introduction

Hello, I'm Pipi. In a Python community a user asked about web crawling and visualization of Bilibili danmu (bullet comments). The original code is shown below.

# import modules
import requests
import re
import jieba
import wordcloud
import imageio
# select mask image
mask = imageio.imread('xing.jpg')
# target URL
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=177974677'
# request headers
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

def get_damu(url):
    response = requests.get(url, headers)
    response = response.content.decode('utf-8')
    data = re.compile('<d.*?>(.*?)</d>')
    danmu = data.findall(response)
    danmu_word = jieba.lcut(" ".join(danmu))
    danmu_str = " ".join(danmu_word)
    w = wordcloud.WordCloud(font_path="msyh.ttc", background_color='white', width=1000, height=500, mask=mask)
    w.generate(danmu_str)
    w.to_file('danmu.png')

if __name__ == '__main__':
    s = input("输入要爬取的弹幕地址:")
    get_damu(s.strip())

The code works but the URL needs to be changed for other videos, which can be hard for beginners.

Implementation

A community member provided a modified version that uses a different mask image and hard‑codes the target URL.

# import modules
import requests
import re
import jieba
import wordcloud
import imageio
mask = imageio.imread('Python进阶者.jpg')
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

def get_damu(url):
    response = requests.get(url, headers)
    response = response.content.decode('utf-8')
    data = re.compile('<d.*?>(.*?)</d>')
    danmu = data.findall(response)
    danmu_word = jieba.lcut(" ".join(danmu))
    danmu_str = " ".join(danmu_word)
    w = wordcloud.WordCloud(font_path="simkai.ttf", background_color='white', width=1000, height=500, mask=mask)
    w.generate(danmu_str)
    w.to_file('danmu.png')

if __name__ == '__main__':
    url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=177974677'
    get_damu(url)

Running this script generates the expected word‑cloud image.

To use a different video, replace the URL with the appropriate Bilibili danmu endpoint, which can be found in the XHR requests of the video page.

Conclusion

This article demonstrates how to fetch Bilibili danmu with Python, process the text using jieba, and visualise it as a word cloud. The provided scripts can be adapted to other videos by changing the URL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Bilibiliword cloud
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.