Backend Development 10 min read

How to Scrape JD Product Reviews and Create Word Clouds with Python

This tutorial walks you through analyzing JD product pages, extracting comment data via requests with proper headers, handling pagination, saving results, cleaning text using jieba, and visualizing frequent terms as a word cloud, all illustrated with step‑by‑step screenshots and code snippets.

MaGe Linux Operations

Jul 11, 2019

How to Scrape JD Product Reviews and Create Word Clouds with Python

1. Requirement Background

In development, product managers first explain the need; here we aim to discover how users feel about inflatable dolls by scraping JD.com product reviews.

We also include a visual of the product to illustrate the target.

2. Feature Description

Using web scraping combined with data analysis to fetch and display real user comments about the product.

3. Technical Solution

We outline three main steps:

Analyze JD comment request URLs.

Use the requests library to fetch comments.

Generate a word cloud for visual analysis.

4. Implementation Details

4.1 Find the comment API URL

Open the product page, open developer tools (F12), click the comment button, and inspect the network requests to locate the URL that returns comment data.

The final comment API URL looks like:

https://sclub.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98vv4654&productId=1263013576&score=0&sortType=5&page=0&pageSize=10&isShadowSku=0&fold=1

4.2 Fetch comments with proper headers

Adding the Referer and User-Agent headers resolves the empty response issue.

4.3 Extract JSON data

The response is JSONP; remove the wrapper fetchJSON_comment98vv4646( and the trailing ) to obtain pure JSON.

Inside the JSON, the comments key holds a list of comment objects; each object's content field contains the user review text.

4.4 Save extracted comments

Comments are written to a plain text file for later processing.

4.5 Batch crawling with pagination

The API supports page and pageSize parameters; incrementing page enables fetching multiple pages.

Code adds a page argument to the spider function and loops to retrieve up to 100 pages, inserting random sleep intervals to avoid IP blocking.

4.6 Data cleaning

Install and use the jieba library to segment Chinese text, optionally removing stop words.

pip3 install jieba

4.7 Generate word cloud

Use numpy, matplotlib, wordcloud, and Pillow to create a visual word cloud; ensure a Chinese font path is set.

The final word cloud image is shown below.

5. Summary

This guide covered locating the comment API, handling anti‑scraping headers, extracting and saving JSON data, implementing pagination for batch crawling, cleaning text with jieba, and visualizing frequent terms via a word cloud, providing a complete end‑to‑end data analysis pipeline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python requests jieba word cloud

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.