Backend Development 10 min read

How to Scrape JD.com Phone Reviews with Python and Visualize the Data

This tutorial walks you through using Python to collect JD.com product comments, parse the JSON responses, store the data in Excel, and then perform simple visual analysis such as color‑distribution bar charts and a word‑cloud of review content.

Python Crawling & Data Mining

Sep 3, 2021

How to Scrape JD.com Phone Reviews with Python and Visualize the Data

Preface

Hello, I am a third‑year mathematics undergraduate and Python web‑scraping enthusiast. In this article I demonstrate how to collect JD.com product comments and perform basic visual analysis.

1. Target Data

With the rise of mobile payments, e‑commerce sites generate massive user reviews. Using JD.com as an example, we will scrape comments for a specific product and conduct simple analysis.

2. Page Analysis

The product detail page URL is: https://item.jd.com/10022971060622.html The comments are fetched from the following API (key parameters highlighted):

https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId=10022971060622&score=0&sortType=5&page=0&pageSize=10&isShadowSku=0&fold=1

Key parameters:

productId – unique identifier of the product

page – comment pagination index

3. Parsing the Data

Request the comment API URL, strip the callback wrapper, and convert the resulting string to JSON.

4. Program

1. Import libraries

import requests
import json
import time
import openpyxl  # for Excel operations
import random

2. Get comments

def get_comments(productId, page):
    url = ('https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98'
           '&productId={0}&score=0&sortType=5&page={1}&pageSize=10&isShadowSku=0&fold=1')
    url = url.format(productId, page)
    resp = requests.get(url, headers=headers)
    s = resp.text.replace('fetchJSON_comment98(', '').replace(');', '')
    res = json.loads(s)
    return res

3. Get maximum page number

def get_max_page(productId):
    data = get_comments(productId, 0)
    return data['maxPage']

4. Extract data

def get_info(productId):
    lst = []
    for page in range(0, get_max_page(productId)):
        comments = get_comments(productId, page)
        for item in comments['comments']:
            content = item['content']
            color   = item['productColor']
            size    = item['productSize']
            lst.append([content, color, size])
        time.sleep(3)  # avoid being blocked
    save(lst)

5. Save to Excel

def save(lst):
    wb = openpyxl.Workbook()
    sheet = wb.active
    for row in lst:
        sheet.append(row)
    wb.save('sales_data.xlsx')

6. Run the program

if __name__ == '__main__':
    productId = '10029693009906'
    get_info(productId)

5. Simple Data Analysis

1. Basic configuration

import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

data = pd.read_excel('sales_data.xlsx', header=None,
                       names=['comments', 'color', 'intro'])

2. Color distribution bar chart

x = ['白色','黑色','绿色','蓝色','红色','紫色']
y = [314,295,181,173,27,10]
plt.bar(x, y)
plt.title('各种颜色手机数量对比')
plt.xlabel('颜色')
plt.ylabel('数量')
plt.show()

The chart shows that white and black phones dominate, accounting for over 60% of purchases.

3. Word‑cloud of comments

First extract the comment text to a plain file:

import xlrd

def row_to_str(row):
    return ''.join(str(v) for v in row)

book = xlrd.open_workbook('sales_data.xlsx')
sheet = book.sheets()[0]
with open('data.txt', 'a', encoding='utf-8') as f:
    for i in range(1, sheet.nrows):
        f.write(row_to_str(sheet.row_values(i)) + '
')

Generate the word cloud:

import jieba
from PIL import Image
import numpy as np
from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = open('data.txt', encoding='gbk').read()
text = text.replace('
', '').replace('\u3000', '')
words = ' '.join(jieba.lcut(text))
mask = np.array(Image.open('xin.png'))
wc = WordCloud(font_path='C:/Windows/Fonts/simfang.ttf',
               mask=mask, background_color='white',
               max_font_size=150, max_words=2000,
               stopwords={'的'}).generate(words)
wc.to_file('wordcloud.png')
wc.to_image().show()

The resulting word cloud visualizes the most frequent terms in the JD comments.

Images

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Visualization Web Scraping JD.com comments-analysis

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.