How Spider_XHS Turns Xiaohongshu Data Collection into a 10× Efficiency Boost
Spider_XHS is an open‑source Xiaohongshu crawler that automates note, user, comment, and message extraction, offers watermark‑free media downloads, exports structured Excel/JSON data, integrates with the creator platform, and includes proxy and anti‑ban features, enabling marketers and researchers to cut weeks of manual work into hours.
Many Xiaohongshu operators, market researchers, and content creators struggle with time‑consuming manual data collection: copying titles, likes, comments, downloading watermarked media, and recording user profiles one by one. These tasks can take hours or days for just a few dozen items.
Key Capabilities of Spider_XHS
All‑in‑One Data Harvesting : Retrieves notes, users, comments, and messages, including titles, descriptions, high‑resolution images/videos, likes, saves, tags, IP locations, and timestamps.
Watermark‑Free Media Download : Saves original‑quality pictures and videos without the need for additional de‑watermark tools.
Structured Export : Supports Excel, JSON, and media folder outputs, making downstream analysis with pivot tables straightforward.
Anti‑Ban Mechanisms : Uses the latest Xiaohongshu API, automatic retries, exception handling, and proxy support; login via QR code or SMS code mimics manual login safety.
Creator Platform Integration : Allows one‑click upload of image sets or videos and management of published works, covering the full workflow from research to publishing.
Open‑Source and Customizable : MIT‑licensed code can be modified to filter content, change sorting, or limit crawl depth, with no usage limits or subscription fees.
Practical Scenarios
1. Competitor Analysis
A beauty brand needed to study 100 competitor notes about “autumn lipstick”. Using Spider_XHS, the team configured a keyword search, set sorting to “hot”, and collected the data in one hour. Exported fields (note ID, title, tags, likes, IP, publish time) were analyzed in Excel, revealing that tags #autumnvibe and #yellowskin received 30% more likes, and users from Guangdong and Zhejiang were most active, leading to a content strategy that doubled new post likes.
2. Content Research
A fashion creator wanted watermark‑free videos of “K‑style autumn outfits”. The tool downloaded 50 videos in ten minutes, automatically organizing them by note title. The creator extracted outfit details and created mash‑ups, achieving a 30‑fold speed increase and a 40% rise in engagement compared with manual screenshot methods.
3. User Persona Profiling
For a mother‑and‑baby account, the team scraped commenters of “0‑3 year baby food” notes, collected 50 user profiles, and exported them. Analysis showed 80% followed pediatrician influencers and favored tags #noadditives and #babynutrition, prompting a shift to more “no‑additive recipe” content and a 35% boost in follower conversion.
Quick‑Start Guide (3 Steps)
Step 1: Prepare Environment & Install Dependencies
Ensure Python 3.7+ and Node.js 18+ are installed.
Clone the repository and install requirements:
# Clone the project
git clone https://github.com/cv-cat/Spider_XHS.git
cd Spider_XHS
# Install Python and Node dependencies
pip install -r requirements.txt
npm installStep 2: Configure Login Cookie
Open Xiaohongshu web version and log in.
Press F12, go to the Network tab, refresh, locate a “fetch” request, view its headers, and copy the “Cookie” value.
Paste the cookie into the .env file at the project root, replacing the example.
Step 3: Run the Crawler and Customize
Execute the entry script: python main.py Modify apis/xhs_pc_apis.py to change search keywords, sorting, or quantity; replace user IDs in scripts to target specific accounts.
After completion, data is saved in the static folder (Excel/JSON at root, media in subfolders).
Final Remarks
Spider_XHS is not intended for illicit scraping; it assists compliant users in accelerating market research, content creation, and user profiling within Xiaohongshu’s policy limits. The project continues to evolve, with recent updates adding video‑type distinction and plans to expose additional metrics such as repost counts and comment interaction rates.
Project URL: https://github.com/cv-cat/Spider_XHS
Old Meng AI Explorer
Tracking global AI developments 24/7, focusing on large model iterations, commercial applications, and tech ethics. We break down hardcore technology into plain language, providing fresh news, in-depth analysis, and practical insights for professionals and enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
