How to Build a Python Weibo Red Envelope Scraper – Step‑by‑Step Guide
This article walks through creating a Python 2.7 script that logs into Weibo, fetches red‑envelope lists, evaluates their value with a custom algorithm, and automatically claims them, covering required libraries, cookie handling, HTTP GET/POST functions, RSA encryption, and result logging.
Background
During the Chinese New Year, the author, a Python beginner, decided to write a script to crawl Weibo red envelopes using Python 2.7.
0x01 Outline
The author sketches the workflow and imports the necessary libraries:
import re
import urllib
import urllib2
import cookielib
import rsa # external library, install via pipAdditional variables are declared:
sys.setdefaultencoding('utf-8')
luckyList = [] # list of red envelopes
lowest = 10 # minimum cash value to consider0x02 Weibo Login
Login requires cookie handling with cookielib.CookieJar() and an opener that processes cookies:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)Two helper functions are defined for HTTP requests:
def getData(url):
try:
req = urllib2.Request(url)
result = opener.open(req)
text = result.read().decode('utf-8').encode('gbk', 'ignore')
return text
except Exception as e:
print u'请求异常,url:' + url
print e
def postData(url, data, header):
try:
data = urllib.urlencode(data)
req = urllib2.Request(url, data, header)
result = opener.open(req)
return result.read()
except Exception as e:
print u'请求异常,url:' + url
print eThe login function (code omitted for brevity) performs RSA encryption of the timestamp and public key, sends the login request, and stores the resulting cookies.
0x03 Claim Red Envelope
After a successful login, the script sends a request to http://huodong.weibo.com/aj_hongbao/getlucky with parameters ouid (red‑envelope ID) and share. If the server returns {"code":303403}, the script copies the original request headers (especially Referer) to avoid the permission error.
The response JSON is parsed; a code of 100000 indicates success, 90114 means the daily limit is reached.
0x04 Crawl Red‑Envelope List
The script fetches the red‑envelope ranking page, extracts each item from the info_wrap div using regular expressions, and builds a list containing the envelope URL, cash value, gift value, and number of recipients.
A simple weighting algorithm is applied:
weight = cash / (recipients + gift_value)The list is sorted by this weight in descending order.
0x05 Determine Usability
For each envelope, the script checks whether a "抢红包" button exists and whether the highest recorded cash amount is acceptable. If the envelope passes the checks, it proceeds to claim it.
0x06 Final Steps
The main start function ties everything together: it logs in, optionally loads a cached luckyList.txt, fetches the latest list, sorts it, and iterates over the envelopes to claim them. Results are logged to text files via a custom log function.
The script ends with a simple command‑line interface that prompts for Weibo username, password, a cash threshold, and whether to use the cached list.
0x07 Summary
The author notes that the crawler works but has many improvement opportunities, such as batch login, better value calculation, and code optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
