What Weibo Comments Reveal About Wang Leehom’s Divorce: A Python Data Dive
This article walks through using Python to scrape Wang Leehom’s divorce‑related Weibo comments, clean the noisy dataset, visualize hourly comment trends, compare with his ex‑wife’s posts, generate word‑clouds and emoji frequency charts, and provides full code and data for reproducible analysis.
Overview
Using Python, the author scraped comments from Wang Leehom’s Weibo post announcing his divorce, then performed data cleaning, deduplication, hourly aggregation, and visualized the results.
Data Cleaning
After downloading the CSV containing usernames, user IDs, timestamps, comment IDs and content, the data was cleaned to remove emojis, topics, and repost markers. Duplicates were dropped based on the unique comment ID.
df_1 = df_1.drop_duplicates(['idstr']).iloc[:,1:]Hourly Aggregation
The cleaned timestamps were converted to datetime objects and grouped by hour to count new comments per hour.
df_1['created_date'] = pd.to_datetime(df_1['created_date'])
df_1_date = df_1.groupby([pd.Grouper(key='created_date',freq='H')]).size().reset_index(name='count')Visualization
Matplotlib was used to draw an area chart of comment volume over time.
columns = df_date.columns
fig = plt.figure(figsize=(10,5), dpi=100)
plt.fill_between(df_date['created_date'].values, y1=df_date['count_x'].values, y2=0,
label='Wang Leehom new comments/hour', alpha=0.75,
facecolor="#43a9cb", linewidth=1, edgecolor='k')
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend(loc='upper right')
plt.show()
fig.savefig('王力宏.png')Comparison with Ex‑wife’s Comments
The same cleaning and aggregation steps were applied to Li Jinglei’s comments, and the two time series were plotted together.
# similar aggregation code for Li Jinglei’s data
columns = df_date.columns
fig = plt.figure(figsize=(10,8), dpi=100)
plt.fill_between(df_date['created_date'].values, y1=df_date['count_x'].values, y2=0,
label='Wang Leehom new comments/hour', alpha=0.75,
facecolor="#43a9cb", linewidth=1, edgecolor='k')
plt.fill_between(df_date['created_date'].values, y1=df_date['count_y'].values, y2=0,
label='Li Jinglei new comments/hour', alpha=0.75,
facecolor="#b7ba6b", linewidth=1, edgecolor='k')
plt.xlabel("Date")
plt.ylabel("Value")
plt.legend(loc='upper left')
plt.show()Word Cloud Generation
Comments were split into two periods (before and after the ex‑wife’s post on 2021‑12‑17 23:08:00). Jieba performed Chinese word segmentation with custom stop‑words and added keywords. Word clouds were created with a transparent background for later overlay.
def get_cut_words(content_series):
import jieba
stop_words = []
with open("stop_words.txt",'r',encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
stop_words.append(line.strip())
my_words = ['分分合合','拉黑']
for i in my_words:
jieba.add_word(i)
my_stop_words = ['快转','转发','微博']
stop_words.extend(my_stop_words)
word_num = jieba.lcut(content_series.str.cat(sep='。'), cut_all=False)
word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]
return word_num_selected
text1 = get_cut_words(df_1[df_1['created_date'] < "2021-12-17 23:08:00"]["text1"])Emoji Frequency Analysis
From 100,000 comments, emoji names were extracted, counted, and the top eight were plotted on a polar chart.
emoji_list = ["[小红花]","[微笑]","[可爱]","[太开心]","[鼓掌]","[嘻嘻]","[哈哈]","[笑cry]","[挤眼]","[馋嘴]","[黑线]","[汗]","[挖鼻]","[哼]","[怒]","[委屈]","[可怜]","[失望]","[悲伤]","[泪]","[允悲]"]
def emoji_lis(string):
entities = []
for i in emoji_list:
if i in string:
entities.append(i)
return entities
emoji_s = []
for index, row in df_1.iterrows():
text = str(row['text'])
emoji_s.extend(emoji_lis(text))
c = collections.Counter(emoji_s)
print(c)Findings
The hourly comment volume spiked on the day of the announcement and again after the ex‑wife’s post, while the ex‑wife’s post attracted far more sustained discussion. The most frequent emojis were sadness‑related (悲伤, 泪, 允悲) and a few expressive ones (微笑, doge, 吐, 单身狗, 吃瓜).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
