Big Data 11 min read

What Bilibili Viewers Really Talk About: Python Scraping & Danmu Trend Analysis

This article demonstrates how to use Python to crawl over 200,000 Bilibili comments, analyze popular memes across different content categories, and presents a step‑by‑step guide to extracting danmu via network requests with code examples for practical data mining.

Python Crawling & Data Mining

Aug 17, 2020

What Bilibili Viewers Really Talk About: Python Scraping & Danmu Trend Analysis

Bilibili is a major Chinese video platform, and this article explores which memes are popular in its comment (danmu) sections across various content categories by crawling more than 200,000 danmu entries using Python.

Danmu Analysis

Outdoor – Hua Nong Brothers

We selected recent high‑view videos from the outdoor creator Hua Nong Brothers, collecting 36,000 danmu over 24 days. A word‑cloud reveals frequent terms such as “兄弟”, “村霸”, “危”, “亿点点”, and “死因”, reflecting the creator’s recurring jokes.

Knowledge – Luo Xiang

For the popular legal educator Luo Xiang, we gathered 60,000 danmu from a high‑view video that amassed over 7 million plays. The dominant words include “哈哈哈”, “开门见三”, “法外狂徒张三”, as well as emotional tags like “好惨啊” and “泪目”.

Life – Handicraft Geng

From the lifestyle creator Handicraft Geng we extracted 21,000 danmu for a video that received 4.6 million views. Frequently repeated phrases include “刑部尚书”, “害怕”, “申请专利”, “量产”, and “老婆/嫂子”, showing the community’s humor around the video’s theme.

Food – Guo Jie‑Rui

Analyzing a food‑related video by Guo Jie‑Rui (15,000 danmu) reveals the most common comment “这个不辣”, along with other keywords such as “血亏”, “汉堡”, “黄金”, and location‑related tags like “美丽的风景线” and “no justice no peace”.

Ghost – Miscellaneous

In the “鬼畜” (ghost) zone, the top 5 videos of July include two based on “让子弹飞” and two on Luo Xiang. From a video titled “张三史上最惨4分钟” we collected 40,000 danmu, with recurring phrases such as “好活当赏”, “下次一定”, “法外狂徒”, “肾宝”, “欢迎回来”, and “每日亿遍”.

Technical Analysis

This section explains how to scrape all danmu for a specific Bilibili video using Python. Traditional methods that request the XML danmu file no longer work, so we capture the network request for the historical danmu AJAX endpoint.

By opening the video, pressing F12, navigating to the Network tab, and selecting “弹幕列表 → 查看历史弹幕”, we can locate a request containing the parameters oid (video ID) and date. These are used to construct URLs of the form:

def get_url(oid, start, end):
    """Generate URLs for danmu of a video between start and end dates.
    oid: video identifier
    start, end: date strings (YYYY‑MM‑DD)
    """
    url_list = []
    date_list = [i for i in pd.date_range(start, end).strftime('%Y-%m-%d')]
    for date in date_list:
        url = f"https://api.bilibili.com/x/v2/dm/history?type=1&oid={oid}&date={date}"
        url_list.append(url)
    return url_list

We then request each URL, parse the returned XML with BeautifulSoup, extract the <d> elements, and write the text to a local .txt file. The implementation uses requests, custom headers (including your own cookie), pandas for date handling, bs4 for parsing, and tqdm for a progress bar. Note that the response encoding must be forced to UTF‑8 to avoid garbled characters.

def get_danmu(url_list, name):
    """Download danmu from a list of URLs and save to a text file.
    """
    headers = {
        "cookie": "YOUR_COOKIE",
        "origin": "https://www.bilibili.com",
        "referer": "https://www.bilibili.com/video/BV1gW411b735",
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-site",
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"
    }
    with open(f"{name}.txt", "w", encoding="utf-8") as file:
        for url in tqdm(url_list):
            res = requests.get(url, headers=headers)
            res.encoding = 'utf-8'
            soup = BeautifulSoup(res.text, "html.parser")
            data = soup.find_all("d")
            danmu = [d.text for d in data]
            for item in danmu:
                file.write(item + "
")
            time.sleep(2)

After obtaining the raw danmu, we can generate word‑cloud visualizations (methods omitted here) to reveal the most frequent memes in each Bilibili community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Web Scraping Bilibili Danmu requests beautifulsoup word cloud

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.