What TV Fans Say: Analyzing 97,331 Danmu Comments with Python
Using Python and pandas, this article collects and analyzes 97,331 danmu comments from the first episode of Mango TV’s “Brother” show, presenting data previews, word clouds, top‑liked remarks, super‑active users, and favorite performers, while also sharing the data‑scraping script.
1. Data Preview
The dataset contains 97,331 danmu entries from the three parts of the first episode. The following code shows how the Excel file is read and its fields inspected.
import pandas as pd
df = pd.read_excel('披荆斩棘的哥哥.xlsx')
df.info()Fields: ids (string) – danmu ID, uid (Int64) – user ID, content (string) – comment text, time (Int64) – timestamp in milliseconds, v2_up_count (Int64) – like count, 时间 (Int64) – time in minutes, 上中下 (string) – segment (upper, middle, lower).
2. Overall Word Cloud
A word cloud generated with a custom tool shows that viewers mainly expressed laughter and applause.
3. Top‑Liked Danmu
The ten most liked comments are all from the middle part, especially during Zhao Wenzhuo’s performance of “流星雨”.
df.sort_values(by='v2_up_count', ascending=False).head(10).style.hide_index().hide_columns(['ids','uid','time'])4. Most Active Danmu Users
One user posted 176 comments over 4.5 hours, averaging 0.65 comments per minute, making them the “danmu maniac”.
df.groupby('uid')['ids'].count().sort_values(ascending=False).to_frame('弹幕数').reset_index().head()5. Most Popular Performers
Keyword analysis reveals that the “Da Wan Qu” group (Chen Xiaochun, Xie Tianhua, Lin Xiaofeng, Zhang Zhiling, Liang Hanwen) and individual stars Zhao Wenzhuo, Li Chengxuan, Ouyang Jing, and Zhang Yunlong received the most attention.
df[df['content'].astype('str').str.contains('大湾区|小春|春哥|谢天华|林晓峰|张智霖|梁汉文')]6. Audience Evaluation of the Show
Many comments praise the Mango TV production.
df[df['content'].astype('str').str.contains('芒果')]7. Danmu Data Collection Script
The following Python script fetches the JSON bullet data for each segment and builds a pandas DataFrame.
import requests
import pandas as pd
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"}
datas = []
for i in range(100):
url = f'https://bullet-ali.hitv.com/bullet/2021/08/17/192249/13137070/{i}.json'
r = requests.get(url, headers=headers)
if r.status_code == 200:
data = r.json()['data']['items']
datas.extend(data)
else:
break
df = pd.DataFrame(datas)
df = df[['ids','uid','content','time','v2_up_count']].fillna(0)
df['时间'] = df.time // 60000Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
