Backend Development 8 min read

Scrape WeChat Moments with Python Scrapy: A Step‑by‑Step Guide

This tutorial shows how to export WeChat Moments using a third‑party service, then build a Python Scrapy spider to crawl the exported pages, parse the JSON data, and save the moments to a file, with detailed commands and code examples.

MaGe Linux Operations

Aug 8, 2018

Scrape WeChat Moments with Python Scrapy: A Step‑by‑Step Guide

Today we share a practical method for extracting WeChat Moments data: first use a third‑party tool to export the moments as a web page, then treat the page like any ordinary website and crawl it with a Python Scrapy spider.

1. Obtain the Moments data source

Follow the public account "出书啦", click 创作书籍 → 微信书 , add the generated friend QR code, wait for the book to be created, and ensure the Moments privacy setting is set to "全部开放" (all open).

After the book is ready, click the external link, scan the QR code to log in, and you will see the web version of the WeChat book.

2. Create the Scrapy project

Make sure Scrapy is installed, then run the following commands in a terminal: scrapy startproject weixin_moment Enter the project directory and generate a spider:

scrapy genspider moment chushu.la

The resulting folder structure looks like this:

3. Analyze the web page data

Open the WeChat book homepage in Chrome, press F12, go to the Network tab, enable Preserve log , and observe that the initial request is a GET returning JSON data (status 200). The response contains the Moments data under the paras/data node.

Navigation buttons load data month by month via POST requests; the request payload changes with each month, indicating dynamic loading.

4. Implement the spider code

Update items.py to define date and moment fields.

In moment.py, import the item class, set up start_requests, and write parse to handle the POST payload and extract JSON data.

Parse the response bytes to string, construct the required POST parameters (year, month, index as strings), and add necessary headers such as Referer to avoid anti‑hotlinking errors.

The response must be decoded from bytes to str before JSON parsing.

All POST parameters must be strings; otherwise the server returns 400.

Include the Referer header to prevent redirection failures.

Define parse_moment to load the JSON, extract the moments, and yield items.

Enable the item pipeline in settings.py to process the scraped data.

Run the spider:

scrapy crawl moment -o moment.json

The resulting moment.json contains the extracted Moments. If the file appears garbled, re‑run the spider with UTF‑8 encoding:

scrapy crawl moment -o moment.json -s FEED_EXPORT_ENCODING=utf-8

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

json data extraction WeChat Web Scraping Scrapy

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.