Scrape Weibo Holiday Leave Reasons with Python and Visualize Trends

This tutorial demonstrates how to use Python to scrape Weibo for the most common National Day leave reasons, extract comment data via Ajax requests, store it in CSV files, and create visualizations such as top‑liked users and word‑clouds to reveal popular motifs.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Scrape Weibo Holiday Leave Reasons with Python and Visualize Trends

1. Data Collection

We used the mobile version of Weibo, opened developer tools (F12) to observe network requests. The data is stored on a page whose URL starts with hotflow and is returned via Ajax. By analyzing the request parameters ( id, mid, max_id_type) we can construct URLs to fetch comment data.

The URL contains fixed id and mid, while max_id changes for each page. The max_id of the next page is stored in the current page’s response, enabling pagination.

Core Data Collection Code

import requests
import re
import time
import csv
for page in range(1,10000):
    if page == 1:
        params = (
            ('id', '4679186482727431'),
            ('mid', '4686092090212455'),
            ('max_id_type', '0'),
        )
    response = requests.get('https://m.weibo.cn/comments/hotflow', headers=headers, params=params)
    a = response.json()['data']['max_id']
    b = response.json()['data']['max_id_type']
    for i in response.json()['data']['data']:
        comment_time = i['created_at']
        ri = comment_time.split()[2]
        shi = comment_time.split()[3].split(':')[0]
        likes = i['like_count']
        content = re.sub(r'<[^>]*>', '', i['text'])
        user_id = i['user']['id']
        username = i['user']['screen_name']
        with open('请假.csv','a',newline='') as f:
            writer = csv.writer(f)
            writer.writerow([username, likes, ri, shi])
        with open(r'请假.txt', 'a', encoding='utf-8') as f:
            f.write(f'{content}
')

2. Visualization

We collected 6,216 records and visualized them. First, we identified the top‑5 users whose comments received the most likes.

Next, we generated a word cloud of the comment content. The most frequent words are “marriage”, “sister”, and “wedding”, indicating that many users cite attending a sister’s wedding as a leave reason.

3. Summary

Try the script yourself, share your own leave reasons, and remember to stay safe during the holiday. This article is intended for learning purposes only.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data visualizationWeb ScrapingajaxWeiboword cloud
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.