How to Scrape Weibo Comments with Python: A Step‑by‑Step Guide
This article explains how to locate Weibo comment APIs, work around rate limits by using the mobile site, extract required parameters, and implement a Python script that handles cookies, pagination, emoji cleaning, deduplication, and scheduled execution to collect comments efficiently.
Part 1 – Theory
To collect comments from a popular Weibo user, the simplest idea is to find the official comment API, change its parameters, and save the returned data.
However, the API is rate‑limited and quickly gets blocked.
By opening the mobile version of Weibo, logging in, and using the browser’s network analysis tool while scrolling through comments, you can discover the request URL and its parameters.
The request contains four key parameters: the first two are the post’s ID (acting like an identity card), and max_id is used for pagination; its value changes with each request and is returned in the response.
Part 2 – Practical Implementation
Using Python, the script first distinguishes between the initial request (no max_id) and subsequent requests (need the max_id returned from the previous response).
The request must include the user’s Weibo cookies, which have a long validity period and can be obtained from the browser’s network tool.
Responses are converted to JSON, and fields such as comment text, commenter nickname, and timestamp are extracted.
Emojis in comments are removed with a regular‑expression filter.
The cleaned comments are written to a text file using the built‑in open function.
Because the API returns at most 16 pages (20 comments per page), a for loop iterates through all pages to collect the maximum amount of data.
A function named job is defined, and the schedule library adds a timer so the script runs every 10 or 30 minutes to fetch new comments.
Before saving, the script deduplicates comments: if a comment already exists in the file it is skipped; otherwise it is appended.
Part 3 – Summary
The method does not retrieve every possible comment due to Weibo’s restrictions, but within those limits it provides an effective way to collect comment data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
