Uncovering 200k iQiyi Danmu: Python Scraping & Insightful Analysis of “The Bad Kids”
The article demonstrates how to scrape over 200,000 iQiyi danmu comments for the drama “The Bad Kids” using Python, then analyzes user activity, episode popularity, top comments, actor mentions, and visualizes the results with word clouds and charts.
Scraping Danmu
The drama “The Bad Kids” (《隐秘的角落》) became a hot topic, and its bullet‑screen comments (danmu) were collected from iQiyi. iQiyi stores danmu as compressed .z files. The process involves obtaining the TV ID list, downloading the corresponding .z files, decompressing them, and storing the data.
def get_data(tv_name,tv_id):
url = 'https://cmts.iqiyi.com/bullet/{}/{}/{}_300_{}.z'
datas = pd.DataFrame(columns=['uid','contentsId','contents','likeCount'])
for i in range(1,20):
myUrl = url.format(tv_id[-4:-2],tv_id[-2:],tv_id,i)
print(myUrl)
res = requests.get(myUrl)
if res.status_code == 200:
btArr = bytearray(res.content)
xml = zlib.decompress(btArr).decode('utf-8')
bs = BeautifulSoup(xml,"xml")
data = pd.DataFrame(columns=['uid','contentsId','contents','likeCount'])
data['uid'] = [i.text for i in bs.findAll('uid')]
data['contentsId'] = [i.text for i in bs.findAll('contentId')]
data['contents'] = [i.text for i in bs.findAll('content')]
data['likeCount'] = [i.text for i in bs.findAll('likeCount')]
else:
break
datas = pd.concat([datas,data],ignore_index = True)
datas['tv_name'] = str(tv_name)
return datasThe script collected a total of 201,865 danmu entries.
Danmu “Launcher” – Most Active Users
By grouping on user ID and counting comment IDs, the cumulative number of danmu per user can be obtained.
#累计发送弹幕数的用户
danmu_counts = df.groupby('uid')['contentsId'].count().sort_values(ascending = False).reset_index()
danmu_counts.columns = ['用户id','累计发送弹幕数']
danmu_counts.head()The top user posted 2,561 comments across the 12‑episode series.
df_top1 = df[df['uid'] == 1810351987].sort_values(by="likeCount",ascending = False).reset_index()
df_top1.head(10)Episode‑Level Danmu Volume
Aggregating danmu counts per episode shows which episodes generated the most audience interaction.
Most Liked Danmu per Episode
For each episode, the comment with the highest like count was extracted.
df_like = df[df.groupby(['tv_name'])['likeCount'].rank(method="first", ascending=False)==1].reset_index()[['tv_name','contents','likeCount']]
df_like.columns = ['剧集','弹幕','赞']
df_likeActor Mention Analysis
Mentions of main characters were counted by checking whether a comment contained any of the actor’s known aliases.
a = {'张东升':'东升|秦昊|张老师', '朱朝阳':'朝阳', '严良':'严良', '普普':'普普', '朱永平':'朱永平', '周春红':'春红|大娘子', '王瑶':'王瑶', '徐静':'徐静|黄米依', '陈冠声':'王景春|老陈|陈冠声', '叶军':'叶军|皮卡皮卡', '马主任':'主任|老马', '朱晶晶':'晶晶', '叶驰敏':'叶驰敏'}
for key, value in a.items():
df[key] = df['contents'].str.contains(value)
staff_count = pd.Series({key: df.loc[df[key], 'contentsId'].count() for key in a.keys()}).sort_values()Word Cloud
A word cloud of the 200k+ comments was generated using the stylecloud library, an enhanced version of the classic wordcloud package.
import stylecloud
from IPython.display import Image
stylecloud.gen_stylecloud(text=' '.join(text1), collocations=False,
font_path=r'C:\Windows\Fonts\msyh.ttc',
icon_name='fas fa-play-circle', size=400,
output_name='隐秘的角落-词云.png')
Image(filename='隐秘的角落-词云.png')The resulting cloud highlights frequent terms such as character names, the popular “爬山” (climbing) meme, and keywords related to children’s thoughts and behaviors, reflecting the drama’s thematic focus.
Conclusion
The analysis shows that certain episodes spark more discussion, a few super‑active users dominate the comment stream, and specific memes become viral. It also demonstrates how Python can be used to scrape, process, and visualize large‑scale video comment data.
Data and visualization source code can be downloaded from: https://alltodata.cowtransfer.com/s/5b483c08987243
References
小z,数据不吹牛: “Python 爬取 394452 条《都挺好》弹幕数据,发现弹幕比剧还精彩?”
数据兔小白: “爬取爱奇艺弹幕后,我找到了共鸣”
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
