Backend Development 5 min read

How to Scrape Weibo Comments with Python: From API Limits to Automated Collection

This guide walks through the challenges of accessing Weibo comment data, demonstrates how to locate and bypass API restrictions using the mobile site, and provides step‑by‑step Python code for fetching, cleaning, de‑duplicating, and scheduling comment extraction into text files.

Python Crawling & Data Mining

Mar 20, 2020

How to Scrape Weibo Comments with Python: From API Limits to Automated Collection

Part 1 – Theory

Imagine you need to crawl comments from a popular Weibo user. The simplest idea is to find the Weibo comment API and change parameters to retrieve the latest data.

First, look for the comment API in the official Weibo API documentation.

Unfortunately, the API is rate‑limited and quickly gets blocked.

Therefore we switch to the mobile version of Weibo, log in, open the target post, and use the browser’s network analysis tool to locate the comment request URL.

In the “Params” tab we can see four parameters: the first two are the post’s ID (like an identity number), and max_id is used for pagination; its value is returned in each response.

Part 2 – Practical Implementation

Based on the above, we write Python code to fetch the comments.

1. Build the request URL. The first request does not need max_id; subsequent requests use the max_id returned from the previous response.

2. Include the Weibo cookie in the request headers; the cookie’s long validity allows us to retrieve many comments.

3. Convert the response to JSON and extract comment text, user nickname, and timestamp.

4. Remove emoji and other non‑text symbols from the comment using a regular expression.

5. Save the cleaned comments to a text file with a simple open call.

6. The API returns at most 16 pages (20 comments per page). To collect more, we loop through pages up to the limit.

7. Define a function job and use the schedule library to run the scraper every 10 or 30 minutes.

8. Perform deduplication: if a comment already exists in the file, skip it; otherwise append it.

The whole process is now complete.

Part 3 – Summary

Although this method cannot retrieve every comment due to Weibo’s restrictions, it provides an effective way to collect a substantial amount of data under the given limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Weibo

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.