Operations 10 min read

How Spider_XHS Turns Xiaohongshu Data Collection into a 10× Efficiency Boost

Spider_XHS is an open‑source Xiaohongshu crawler that automates note, user, comment, and message extraction, offers watermark‑free media downloads, exports structured Excel/JSON data, integrates with the creator platform, and includes proxy and anti‑ban features, enabling marketers and researchers to cut weeks of manual work into hours.

Old Meng AI Explorer

Dec 10, 2025

How Spider_XHS Turns Xiaohongshu Data Collection into a 10× Efficiency Boost

Many Xiaohongshu operators, market researchers, and content creators struggle with time‑consuming manual data collection: copying titles, likes, comments, downloading watermarked media, and recording user profiles one by one. These tasks can take hours or days for just a few dozen items.

Key Capabilities of Spider_XHS

All‑in‑One Data Harvesting : Retrieves notes, users, comments, and messages, including titles, descriptions, high‑resolution images/videos, likes, saves, tags, IP locations, and timestamps.

Watermark‑Free Media Download : Saves original‑quality pictures and videos without the need for additional de‑watermark tools.

Structured Export : Supports Excel, JSON, and media folder outputs, making downstream analysis with pivot tables straightforward.

Anti‑Ban Mechanisms : Uses the latest Xiaohongshu API, automatic retries, exception handling, and proxy support; login via QR code or SMS code mimics manual login safety.

Creator Platform Integration : Allows one‑click upload of image sets or videos and management of published works, covering the full workflow from research to publishing.

Open‑Source and Customizable : MIT‑licensed code can be modified to filter content, change sorting, or limit crawl depth, with no usage limits or subscription fees.

Practical Scenarios

1. Competitor Analysis

A beauty brand needed to study 100 competitor notes about “autumn lipstick”. Using Spider_XHS, the team configured a keyword search, set sorting to “hot”, and collected the data in one hour. Exported fields (note ID, title, tags, likes, IP, publish time) were analyzed in Excel, revealing that tags #autumnvibe and #yellowskin received 30% more likes, and users from Guangdong and Zhejiang were most active, leading to a content strategy that doubled new post likes.

2. Content Research

A fashion creator wanted watermark‑free videos of “K‑style autumn outfits”. The tool downloaded 50 videos in ten minutes, automatically organizing them by note title. The creator extracted outfit details and created mash‑ups, achieving a 30‑fold speed increase and a 40% rise in engagement compared with manual screenshot methods.

3. User Persona Profiling

For a mother‑and‑baby account, the team scraped commenters of “0‑3 year baby food” notes, collected 50 user profiles, and exported them. Analysis showed 80% followed pediatrician influencers and favored tags #noadditives and #babynutrition, prompting a shift to more “no‑additive recipe” content and a 35% boost in follower conversion.

Quick‑Start Guide (3 Steps)

Step 1: Prepare Environment & Install Dependencies

Ensure Python 3.7+ and Node.js 18+ are installed.

Clone the repository and install requirements:

# Clone the project
git clone https://github.com/cv-cat/Spider_XHS.git
cd Spider_XHS
# Install Python and Node dependencies
pip install -r requirements.txt
npm install

Step 2: Configure Login Cookie

Open Xiaohongshu web version and log in.

Press F12, go to the Network tab, refresh, locate a “fetch” request, view its headers, and copy the “Cookie” value.

Paste the cookie into the .env file at the project root, replacing the example.

Step 3: Run the Crawler and Customize

Execute the entry script: python main.py Modify apis/xhs_pc_apis.py to change search keywords, sorting, or quantity; replace user IDs in scripts to target specific accounts.

After completion, data is saved in the static folder (Excel/JSON at root, media in subfolders).

Final Remarks

Spider_XHS is not intended for illicit scraping; it assists compliant users in accelerating market research, content creation, and user profiling within Xiaohongshu’s policy limits. The project continues to evolve, with recent updates adding video‑type distinction and plans to expose additional metrics such as repost counts and comment interaction rates.

Project URL: https://github.com/cv-cat/Spider_XHS

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Collection automation open-source Xiaohongshu web scraping marketing research

Written by

Old Meng AI Explorer

Tracking global AI developments 24/7, focusing on large model iterations, commercial applications, and tech ethics. We break down hardcore technology into plain language, providing fresh news, in-depth analysis, and practical insights for professionals and enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.