Scrape Weibo Holiday Leave Reasons with Python and Visualize Trends
This tutorial demonstrates how to use Python to scrape Weibo for the most common National Day leave reasons, extract comment data via Ajax requests, store it in CSV files, and create visualizations such as top‑liked users and word‑clouds to reveal popular motifs.
1. Data Collection
We used the mobile version of Weibo, opened developer tools (F12) to observe network requests. The data is stored on a page whose URL starts with hotflow and is returned via Ajax. By analyzing the request parameters ( id, mid, max_id_type) we can construct URLs to fetch comment data.
The URL contains fixed id and mid, while max_id changes for each page. The max_id of the next page is stored in the current page’s response, enabling pagination.
Core Data Collection Code
import requests
import re
import time
import csv
for page in range(1,10000):
if page == 1:
params = (
('id', '4679186482727431'),
('mid', '4686092090212455'),
('max_id_type', '0'),
)
response = requests.get('https://m.weibo.cn/comments/hotflow', headers=headers, params=params)
a = response.json()['data']['max_id']
b = response.json()['data']['max_id_type']
for i in response.json()['data']['data']:
comment_time = i['created_at']
ri = comment_time.split()[2]
shi = comment_time.split()[3].split(':')[0]
likes = i['like_count']
content = re.sub(r'<[^>]*>', '', i['text'])
user_id = i['user']['id']
username = i['user']['screen_name']
with open('请假.csv','a',newline='') as f:
writer = csv.writer(f)
writer.writerow([username, likes, ri, shi])
with open(r'请假.txt', 'a', encoding='utf-8') as f:
f.write(f'{content}
')2. Visualization
We collected 6,216 records and visualized them. First, we identified the top‑5 users whose comments received the most likes.
Next, we generated a word cloud of the comment content. The most frequent words are “marriage”, “sister”, and “wedding”, indicating that many users cite attending a sister’s wedding as a leave reason.
3. Summary
Try the script yourself, share your own leave reasons, and remember to stay safe during the holiday. This article is intended for learning purposes only.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
