Python Text Mining & Sentiment Analysis of Douban Reviews for “Letter to Grandma”

This article demonstrates how to use Python to crawl 7,275 Douban short reviews of the film “Letter to Grandma”, clean the data, generate a word‑cloud and frequency bar chart, and perform sentiment analysis that reveals over 91% of comments are positive.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Python Text Mining & Sentiment Analysis of Douban Reviews for “Letter to Grandma”

As a Python enthusiast, the author crawled 7,275 short comments for the movie “Letter to Grandma” from Douban and performed text mining and sentiment analysis.

First, the required libraries are installed:

pip install requests beautifulsoup4 pandas jieba wordcloud matplotlib snownlp lxml

The spider (not fully shown) is executed with the user’s own cookie to fetch the comments and save them as 豆瓣_给阿嬷的情书_短评.csv.

Data cleaning is then performed:

import pandas as pd
import re

df = pd.read_csv("豆瓣_给阿嬷的情书_短评.csv", encoding="utf-8-sig")
df = df.dropna(subset=["comment"]).copy()
df = df[df["comment"].str.len() > 3]

def clean_comment(text):
    text = re.sub(r'[\U00010000-\U0010ffff]', '', text)
    text = re.sub(r'[^\u4e00-\u9fa5a-zA-Z0-9]', '', text)
    return text.strip()

df["clean_comment"] = df["comment"].apply(clean_comment)
df = df.drop_duplicates(subset=["clean_comment"])
print(f"✅ 清洗完成,剩余有效评论:{len(df)} 条")

For word‑frequency statistics and visualization, all cleaned comments are concatenated, segmented with jieba, stop words are removed, and the top‑20 words are counted. A word cloud and a bar chart are generated:

all_text = "".join(df["clean_comment"].tolist())
stop_words = {"的","了","我","是","很","都","就","也","还","在","和","不","有","着","看","感觉","觉得","真的","这部","电影"}
words = jieba.lcut(all_text)
valid_words = [w for w in words if w not in stop_words and len(w) > 1]
word_count = Counter(valid_words)
top20 = word_count.most_common(20)
print("
🔥 高频TOP20词汇:
", top20)

# word cloud
wc = WordCloud(background_color="white", font_path="simhei.ttf", width=1200, height=700, max_words=300, colormap="Oranges")
wc.generate(" ".join(valid_words))
wc.to_file("豆瓣_阿嬷情书_词云图.png")

# bar chart
names = [x[0] for x in top20]
nums = [x[1] for x in top20]
plt.figure(figsize=(14,6))
plt.bar(names, nums, color="#ff9966")
plt.title("《给阿嬷的情书》豆瓣短评高频词汇TOP20", fontsize=16)
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("豆瓣_高频词汇柱状图.png")
plt.show()

The top‑20 words include “南枝”, “女性”, “潮汕”, “这个”, “故事”, confirming that the film’s core characters, female‑growth theme, and regional background attract the most attention.

Sentiment analysis is carried out with SnowNLP:

def get_sentiment(text):
    score = SnowNLP(text).sentiments
    if score > 0.5:
        return "温暖正向", score
    elif score == 0.5:
        return "中性平淡", score
    else:
        return "遗憾伤感", score

df[["sent_type", "sent_score"]] = pd.DataFrame(df["clean_comment"].apply(get_sentiment).tolist(), index=df.index)
sent_stat = df["sent_type"].value_counts()
print("
🔥 情感分布统计:
", sent_stat)

plt.figure(figsize=(8,8))
plt.pie(sent_stat.values, labels=sent_stat.index, autopct="%1.2f%%", colors=["#ffb380","#ffe6cc","#ff8080"])
plt.title("《给阿嬷的情书》豆瓣短评情感分布", fontsize=16)
plt.savefig("豆瓣_情感分布饼图.png")
plt.show()

The sentiment distribution shows 91.49% of comments classified as “温暖正向” (warm‑positive) and 8.51% as other sentiments, indicating an overwhelmingly positive audience reception.

Finally, the cleaned data with sentiment labels are saved to 豆瓣_阿嬷情书_最终分析数据.csv, and three visualizations—word cloud, frequency bar chart, and sentiment pie chart—are generated.

Word Cloud
Word Cloud
Top20 Bar Chart
Top20 Bar Chart
Sentiment Pie Chart
Sentiment Pie Chart
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

sentiment-analysisdata-visualizationtext-miningweb-scrapingdoubanwordcloud
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.