Fundamentals 8 min read

What TV Fans Say: Analyzing 97,331 Danmu Comments with Python

Using Python and pandas, this article collects and analyzes 97,331 danmu comments from the first episode of Mango TV’s “Brother” show, presenting data previews, word clouds, top‑liked remarks, super‑active users, and favorite performers, while also sharing the data‑scraping script.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
What TV Fans Say: Analyzing 97,331 Danmu Comments with Python

1. Data Preview

The dataset contains 97,331 danmu entries from the three parts of the first episode. The following code shows how the Excel file is read and its fields inspected.

import pandas as pd
df = pd.read_excel('披荆斩棘的哥哥.xlsx')
df.info()

Fields: ids (string) – danmu ID, uid (Int64) – user ID, content (string) – comment text, time (Int64) – timestamp in milliseconds, v2_up_count (Int64) – like count, 时间 (Int64) – time in minutes, 上中下 (string) – segment (upper, middle, lower).

2. Overall Word Cloud

A word cloud generated with a custom tool shows that viewers mainly expressed laughter and applause.

3. Top‑Liked Danmu

The ten most liked comments are all from the middle part, especially during Zhao Wenzhuo’s performance of “流星雨”.

df.sort_values(by='v2_up_count', ascending=False).head(10).style.hide_index().hide_columns(['ids','uid','time'])

4. Most Active Danmu Users

One user posted 176 comments over 4.5 hours, averaging 0.65 comments per minute, making them the “danmu maniac”.

df.groupby('uid')['ids'].count().sort_values(ascending=False).to_frame('弹幕数').reset_index().head()

5. Most Popular Performers

Keyword analysis reveals that the “Da Wan Qu” group (Chen Xiaochun, Xie Tianhua, Lin Xiaofeng, Zhang Zhiling, Liang Hanwen) and individual stars Zhao Wenzhuo, Li Chengxuan, Ouyang Jing, and Zhang Yunlong received the most attention.

df[df['content'].astype('str').str.contains('大湾区|小春|春哥|谢天华|林晓峰|张智霖|梁汉文')]

6. Audience Evaluation of the Show

Many comments praise the Mango TV production.

df[df['content'].astype('str').str.contains('芒果')]

7. Danmu Data Collection Script

The following Python script fetches the JSON bullet data for each segment and builds a pandas DataFrame.

import requests
import pandas as pd

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"}

datas = []
for i in range(100):
    url = f'https://bullet-ali.hitv.com/bullet/2021/08/17/192249/13137070/{i}.json'
    r = requests.get(url, headers=headers)
    if r.status_code == 200:
        data = r.json()['data']['items']
        datas.extend(data)
    else:
        break

df = pd.DataFrame(datas)
df = df[['ids','uid','content','time','v2_up_count']].fillna(0)
df['时间'] = df.time // 60000
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data analysisvisualizationpandasDanmu
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.