Turning 1,000 Douban Movie Reviews into a Chinese Word Cloud with MongoDB & Jieba
This article demonstrates how to extract 1,000 short movie reviews stored in MongoDB, apply Chinese word segmentation using Jieba, select the top 50 terms, generate a visual word cloud, and perform additional analyses such as top‑liked comments and 15‑day comment volume trends.
Reading short review data from Mongo and performing Chinese word segmentation
The author retrieved about 1,000 short comments from a MongoDB collection of Douban movie reviews. Using the Jieba library for Chinese word segmentation (without a custom dictionary), the text is tokenized into words ready for analysis.
Take top 50 segmentation results to generate a word cloud
After tokenization, the most frequent 50 words are selected and visualized with the WordCloud library, producing a colorful word cloud that highlights the most discussed terms in the reviews.
Other analysis tasks
Top 10 liked comments
author = 忻钰坤, date = 2018-07-04, vote = 28129, comment = “你敢保证你一辈子不得病?” …
author = 沐子荒, date = 2018-07-03, vote = 27237, comment = 王传君所有不被外人理解的坚持 …
author = 凌睿, date = 2018-06-30, vote = 18304, comment = 别说这是“中国版《达拉斯买家俱乐部》” …
author = 徐若风, date = 2018-06-06, vote = 16426, comment = 放豆瓣语境下,是部时至今日终于拍出来的国产“高分韩国电影” …
author = 桃桃淘电影, date = 2018-06-19, vote = 13337, comment = 最大的病,其实是穷病 …
author = 远世祖, date = 2018-06-30, vote = 9102, comment = 文牧野眼睛太毒了,观众的笑点、泪点、痛点被他牢牢抓住 …
author = 影志, date = 2018-06-19, vote = 7076, comment = “今后都会越来越好吧,希望这一天早点来” …
author = Noodles, date = 2018-07-03, vote = 6926, comment = 人生建议:别买零食,吃不下的。
author = 哪吒男, date = 2018-06-25, vote = 6211, comment = 最喜欢王传君的表演啊,几乎所有泪点都给他了!!
author = 开开kergelen, date = 2018-07-04, vote = 5549, comment = 小时候路过一家药店,门口的对联写着“只愿世间无疾病,何愁架上药染尘”。
15‑day comment volume distribution and trend
Project source code – feel free to star or fork
https://gitee.com/zlikun/python-crawler-douban-movie
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
