Big Data 9 min read

Uncovering 200k iQiyi Danmu: Python Scraping & Insightful Analysis of “The Bad Kids”

The article demonstrates how to scrape over 200,000 iQiyi danmu comments for the drama “The Bad Kids” using Python, then analyzes user activity, episode popularity, top comments, actor mentions, and visualizes the results with word clouds and charts.

Python Crawling & Data Mining

Jun 29, 2020

Uncovering 200k iQiyi Danmu: Python Scraping & Insightful Analysis of “The Bad Kids”

Scraping Danmu

The drama “The Bad Kids” (《隐秘的角落》) became a hot topic, and its bullet‑screen comments (danmu) were collected from iQiyi. iQiyi stores danmu as compressed .z files. The process involves obtaining the TV ID list, downloading the corresponding .z files, decompressing them, and storing the data.

def get_data(tv_name,tv_id):
    url = 'https://cmts.iqiyi.com/bullet/{}/{}/{}_300_{}.z'
    datas = pd.DataFrame(columns=['uid','contentsId','contents','likeCount'])
    for i in range(1,20):
        myUrl = url.format(tv_id[-4:-2],tv_id[-2:],tv_id,i)
        print(myUrl)
        res = requests.get(myUrl)
        if res.status_code == 200:
            btArr = bytearray(res.content)
            xml = zlib.decompress(btArr).decode('utf-8')
            bs = BeautifulSoup(xml,"xml")
            data = pd.DataFrame(columns=['uid','contentsId','contents','likeCount'])
            data['uid'] = [i.text for i in bs.findAll('uid')]
            data['contentsId'] = [i.text for i in bs.findAll('contentId')]
            data['contents'] = [i.text for i in bs.findAll('content')]
            data['likeCount'] = [i.text for i in bs.findAll('likeCount')]
        else:
            break
        datas = pd.concat([datas,data],ignore_index = True)
    datas['tv_name'] = str(tv_name)
    return datas

The script collected a total of 201,865 danmu entries.

Danmu “Launcher” – Most Active Users

By grouping on user ID and counting comment IDs, the cumulative number of danmu per user can be obtained.

#累计发送弹幕数的用户
danmu_counts = df.groupby('uid')['contentsId'].count().sort_values(ascending = False).reset_index()
danmu_counts.columns = ['用户id','累计发送弹幕数']
danmu_counts.head()

The top user posted 2,561 comments across the 12‑episode series.

df_top1 = df[df['uid'] == 1810351987].sort_values(by="likeCount",ascending = False).reset_index()
df_top1.head(10)

Episode‑Level Danmu Volume

Aggregating danmu counts per episode shows which episodes generated the most audience interaction.

Most Liked Danmu per Episode

For each episode, the comment with the highest like count was extracted.

df_like = df[df.groupby(['tv_name'])['likeCount'].rank(method="first", ascending=False)==1].reset_index()[['tv_name','contents','likeCount']]
df_like.columns = ['剧集','弹幕','赞']
df_like

Actor Mention Analysis

Mentions of main characters were counted by checking whether a comment contained any of the actor’s known aliases.

a = {'张东升':'东升|秦昊|张老师', '朱朝阳':'朝阳', '严良':'严良', '普普':'普普', '朱永平':'朱永平', '周春红':'春红|大娘子', '王瑶':'王瑶', '徐静':'徐静|黄米依', '陈冠声':'王景春|老陈|陈冠声', '叶军':'叶军|皮卡皮卡', '马主任':'主任|老马', '朱晶晶':'晶晶', '叶驰敏':'叶驰敏'}
for key, value in a.items():
    df[key] = df['contents'].str.contains(value)
staff_count = pd.Series({key: df.loc[df[key], 'contentsId'].count() for key in a.keys()}).sort_values()

Word Cloud

A word cloud of the 200k+ comments was generated using the stylecloud library, an enhanced version of the classic wordcloud package.

import stylecloud
from IPython.display import Image
stylecloud.gen_stylecloud(text=' '.join(text1), collocations=False,
    font_path=r'C:\Windows\Fonts\msyh.ttc',
    icon_name='fas fa-play-circle', size=400,
    output_name='隐秘的角落-词云.png')
Image(filename='隐秘的角落-词云.png')

The resulting cloud highlights frequent terms such as character names, the popular “爬山” (climbing) meme, and keywords related to children’s thoughts and behaviors, reflecting the drama’s thematic focus.

Conclusion

The analysis shows that certain episodes spark more discussion, a few super‑active users dominate the comment stream, and specific memes become viral. It also demonstrates how Python can be used to scrape, process, and visualize large‑scale video comment data.

Data and visualization source code can be downloaded from: https://alltodata.cowtransfer.com/s/5b483c08987243

References

小z，数据不吹牛: “Python 爬取 394452 条《都挺好》弹幕数据，发现弹幕比剧还精彩？”

数据兔小白: “爬取爱奇艺弹幕后，我找到了共鸣”

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Web Scraping Danmu word cloud

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.