Fundamentals 10 min read

What Weibo Comments Reveal About Wang Leehom’s Divorce: A Python Data Dive

This article walks through using Python to scrape Wang Leehom’s divorce‑related Weibo comments, clean the noisy dataset, visualize hourly comment trends, compare with his ex‑wife’s posts, generate word‑clouds and emoji frequency charts, and provides full code and data for reproducible analysis.

Python Crawling & Data Mining

Dec 20, 2021

What Weibo Comments Reveal About Wang Leehom’s Divorce: A Python Data Dive

Overview

Using Python, the author scraped comments from Wang Leehom’s Weibo post announcing his divorce, then performed data cleaning, deduplication, hourly aggregation, and visualized the results.

Data Cleaning

After downloading the CSV containing usernames, user IDs, timestamps, comment IDs and content, the data was cleaned to remove emojis, topics, and repost markers. Duplicates were dropped based on the unique comment ID.

df_1 = df_1.drop_duplicates(['idstr']).iloc[:,1:]

Hourly Aggregation

The cleaned timestamps were converted to datetime objects and grouped by hour to count new comments per hour.

df_1['created_date'] = pd.to_datetime(df_1['created_date'])
df_1_date = df_1.groupby([pd.Grouper(key='created_date',freq='H')]).size().reset_index(name='count')

Visualization

Matplotlib was used to draw an area chart of comment volume over time.

columns = df_date.columns
fig = plt.figure(figsize=(10,5), dpi=100)
plt.fill_between(df_date['created_date'].values, y1=df_date['count_x'].values, y2=0,
                 label='Wang Leehom new comments/hour', alpha=0.75,
                 facecolor="#43a9cb", linewidth=1, edgecolor='k')
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend(loc='upper right')
plt.show()
fig.savefig('王力宏.png')

Comparison with Ex‑wife’s Comments

The same cleaning and aggregation steps were applied to Li Jinglei’s comments, and the two time series were plotted together.

# similar aggregation code for Li Jinglei’s data
columns = df_date.columns
fig = plt.figure(figsize=(10,8), dpi=100)
plt.fill_between(df_date['created_date'].values, y1=df_date['count_x'].values, y2=0,
                 label='Wang Leehom new comments/hour', alpha=0.75,
                 facecolor="#43a9cb", linewidth=1, edgecolor='k')
plt.fill_between(df_date['created_date'].values, y1=df_date['count_y'].values, y2=0,
                 label='Li Jinglei new comments/hour', alpha=0.75,
                 facecolor="#b7ba6b", linewidth=1, edgecolor='k')
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend(loc='upper left')
plt.show()

Word Cloud Generation

Comments were split into two periods (before and after the ex‑wife’s post on 2021‑12‑17 23:08:00). Jieba performed Chinese word segmentation with custom stop‑words and added keywords. Word clouds were created with a transparent background for later overlay.

def get_cut_words(content_series):
    import jieba
    stop_words = []
    with open("stop_words.txt",'r',encoding='utf-8') as f:
        lines = f.readlines()
        for line in lines:
            stop_words.append(line.strip())
    my_words = ['分分合合','拉黑']
    for i in my_words:
        jieba.add_word(i)
    my_stop_words = ['快转','转发','微博']
    stop_words.extend(my_stop_words)
    word_num = jieba.lcut(content_series.str.cat(sep='。'), cut_all=False)
    word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]
    return word_num_selected

text1 = get_cut_words(df_1[df_1['created_date'] < "2021-12-17 23:08:00"]["text1"])

Emoji Frequency Analysis

From 100,000 comments, emoji names were extracted, counted, and the top eight were plotted on a polar chart.

emoji_list = ["[小红花]","[微笑]","[可爱]","[太开心]","[鼓掌]","[嘻嘻]","[哈哈]","[笑cry]","[挤眼]","[馋嘴]","[黑线]","[汗]","[挖鼻]","[哼]","[怒]","[委屈]","[可怜]","[失望]","[悲伤]","[泪]","[允悲]"]

def emoji_lis(string):
    entities = []
    for i in emoji_list:
        if i in string:
            entities.append(i)
    return entities

emoji_s = []
for index, row in df_1.iterrows():
    text = str(row['text'])
    emoji_s.extend(emoji_lis(text))

c = collections.Counter(emoji_s)
print(c)

Findings

The hourly comment volume spiked on the day of the announcement and again after the ex‑wife’s post, while the ex‑wife’s post attracted far more sustained discussion. The most frequent emojis were sadness‑related (悲伤, 泪, 允悲) and a few expressive ones (微笑, doge, 吐, 单身狗, 吃瓜).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data cleaning Data Visualization Web Scraping Weibo word cloud Emoji Analysis

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.