How Python Data Mining Uncovers Why '30 Only' Became a Summer Hit
This article uses Python to scrape and analyze Douban ratings, user comments, and Tencent video danmu for the TV drama “30 Only”, revealing the show’s explosive popularity, the most discussed characters, and audience sentiment through statistical charts and word‑cloud visualizations.
Introduction
The Chinese drama “30 Only” dominated social media and search trends during the summer, prompting a data‑driven investigation into why it resonated so strongly with viewers.
Data Sources
Two main sources were used:
Douban – rating scores, short‑review counts and comment data.
Tencent Video – over 271,000 danmu (real‑time comments) collected from the 15 episodes.
Data Analysis
Douban Rating
The series accumulated more than 42.2 billion reads and 148.8 k discussion posts on Weibo, with an average Douban score of 8.0, which is high for domestic productions.
Comment Word Cloud
Word‑cloud analysis of Douban short reviews highlighted the keywords “female”, “plot”, “like” and frequent mentions of actors Jiang Shuying , Tong Yao and Mao Xiaotong .
Danmu Analysis
From Tencent Video we extracted 271,049 danmu (average 18,069 per episode, roughly 401 per minute). The following steps were performed in Python:
Data acquisition and loading with pandas.
Pre‑processing to extract character tags and classify users.
Visualization of results.
Key findings:
Most mentioned characters: Wang Manni , Gu Jia , Zhong Xiaoqin .
VIP users were identified by the presence of these character tags.
Visualization
Three main charts were generated using pyecharts:
Pie chart showing the distribution of user levels (VIP, ordinary, unknown).
Bar chart ranking the popularity of danmu characters.
Word‑clouds for each major character (Wang Manni, Gu Jia, Zhong Xiaoqin, Chen Yu, Xu Huanshan).
# Import libraries
import os
import jieba
import numpy as np
import pandas as pd
from pyecharts.charts import Bar, Pie, WordCloud
from pyecharts import options as opts
# Read data files
data_list = os.listdir('../data/')
df_all = pd.DataFrame()
for i in data_list:
if i.split('.')[-1] == 'csv':
df_one = pd.read_csv(f'../data/{i}', engine='python', encoding='utf-8', index_col=0)
df_all = df_all.append(df_one, ignore_index=False)
# Extract character tags
pattern = r'(王漫妮\s*|钟晓芹\s*|顾佳\s*|陈屿\s*|许幻山\s*|飒飒\s*|浪浪\s*):.*'
df_all['danmu_role'] = df_all['content'].str.extract(pattern)[0].str.strip()
def transform_name(x):
if x in ['王漫妮', '顾佳', '钟晓芹', '陈屿', '许幻山', '飒飒', '浪浪']:
return 'VIP用户'
elif x == 'NaN':
return '未知用户'
else:
return '普通用户'
df_all['danmu_level'] = df_all['danmu_role'].apply(transform_name)Conclusion
The analysis shows that “30 Only” struck a chord by portraying three distinct 30‑year‑old women and their dilemmas, which resonated with a large female audience. Data‑driven insights such as character popularity, sentiment keywords, and user‑level distribution help explain the show’s viral success and illustrate how Python can turn entertainment data into actionable knowledge.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
