How to Scrape JD.com Phone Reviews with Python and Visualize the Data
This tutorial walks you through using Python to collect JD.com product comments, parse the JSON responses, store the data in Excel, and then perform simple visual analysis such as color‑distribution bar charts and a word‑cloud of review content.
Preface
Hello, I am a third‑year mathematics undergraduate and Python web‑scraping enthusiast. In this article I demonstrate how to collect JD.com product comments and perform basic visual analysis.
1. Target Data
With the rise of mobile payments, e‑commerce sites generate massive user reviews. Using JD.com as an example, we will scrape comments for a specific product and conduct simple analysis.
2. Page Analysis
The product detail page URL is: https://item.jd.com/10022971060622.html The comments are fetched from the following API (key parameters highlighted):
https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId=10022971060622&score=0&sortType=5&page=0&pageSize=10&isShadowSku=0&fold=1Key parameters:
productId – unique identifier of the product
page – comment pagination index
3. Parsing the Data
Request the comment API URL, strip the callback wrapper, and convert the resulting string to JSON.
4. Program
1. Import libraries
import requests
import json
import time
import openpyxl # for Excel operations
import random2. Get comments
def get_comments(productId, page):
url = ('https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98'
'&productId={0}&score=0&sortType=5&page={1}&pageSize=10&isShadowSku=0&fold=1')
url = url.format(productId, page)
resp = requests.get(url, headers=headers)
s = resp.text.replace('fetchJSON_comment98(', '').replace(');', '')
res = json.loads(s)
return res3. Get maximum page number
def get_max_page(productId):
data = get_comments(productId, 0)
return data['maxPage']4. Extract data
def get_info(productId):
lst = []
for page in range(0, get_max_page(productId)):
comments = get_comments(productId, page)
for item in comments['comments']:
content = item['content']
color = item['productColor']
size = item['productSize']
lst.append([content, color, size])
time.sleep(3) # avoid being blocked
save(lst)5. Save to Excel
def save(lst):
wb = openpyxl.Workbook()
sheet = wb.active
for row in lst:
sheet.append(row)
wb.save('sales_data.xlsx')6. Run the program
if __name__ == '__main__':
productId = '10029693009906'
get_info(productId)5. Simple Data Analysis
1. Basic configuration
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
data = pd.read_excel('sales_data.xlsx', header=None,
names=['comments', 'color', 'intro'])2. Color distribution bar chart
x = ['白色','黑色','绿色','蓝色','红色','紫色']
y = [314,295,181,173,27,10]
plt.bar(x, y)
plt.title('各种颜色手机数量对比')
plt.xlabel('颜色')
plt.ylabel('数量')
plt.show()The chart shows that white and black phones dominate, accounting for over 60% of purchases.
3. Word‑cloud of comments
First extract the comment text to a plain file:
import xlrd
def row_to_str(row):
return ''.join(str(v) for v in row)
book = xlrd.open_workbook('sales_data.xlsx')
sheet = book.sheets()[0]
with open('data.txt', 'a', encoding='utf-8') as f:
for i in range(1, sheet.nrows):
f.write(row_to_str(sheet.row_values(i)) + '
')Generate the word cloud:
import jieba
from PIL import Image
import numpy as np
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = open('data.txt', encoding='gbk').read()
text = text.replace('
', '').replace('\u3000', '')
words = ' '.join(jieba.lcut(text))
mask = np.array(Image.open('xin.png'))
wc = WordCloud(font_path='C:/Windows/Fonts/simfang.ttf',
mask=mask, background_color='white',
max_font_size=150, max_words=2000,
stopwords={'的'}).generate(words)
wc.to_file('wordcloud.png')
wc.to_image().show()The resulting word cloud visualizes the most frequent terms in the JD comments.
Images
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
