How to Scrape JD.com Product Data with Python Regex: A Step‑by‑Step Guide

This tutorial shows how to build a JD.com search URL, encode keywords, fetch the page with Python's urllib, and extract product details using regular expressions, providing code snippets, regex explanations, and sample output for beginners.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape JD.com Product Data with Python Regex: A Step‑by‑Step Guide

JD.com is China’s largest self‑operated e‑commerce platform. By entering a keyword such as “狗粮” (dog food) into the search box, the resulting URL looks like

https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8

. The keyword parameter is URL‑encoded, so any desired search term can be inserted after encoding.

Constructing the Search URL

Use Python’s urllib.parse.quote to encode the keyword and concatenate it with the base URL and the enc=utf-8 flag.

import urllib.parse
keyword = '狗粮'
search_url = 'https://search.jd.com/Search?keyword=' + urllib.parse.quote(keyword) + '&enc=utf-8'

Fetching the Page Source

Retrieve the HTML with urllib.request.urlopen and decode it as UTF‑8.

import urllib.request
response = urllib.request.urlopen(search_url)
html = response.read().decode('utf-8')

Extracting Information with Regular Expressions

The article explains that patterns like [\w\W]+? or [\s\S]+? act as a full‑character wildcard, matching any character including line breaks, which is more powerful than the dot . operator. Example regexes are demonstrated to capture product titles, prices, and URLs.

import re
# Example pattern that matches a product block (simplified)
pattern = r'"skuName":"(.*?)".*?"price":"(.*?)"'
matches = re.findall(pattern, html, re.S)
for title, price in matches:
    print(f'Title: {title}, Price: {price}')

Sample Output

The script prints the extracted fields, and a screenshot of the console output is shown below.

Output screenshot
Output screenshot

Next Steps

The guide mentions that only four fields were captured on a single page; readers are encouraged to modify the regexes and add pagination logic to collect more data. A follow‑up article will demonstrate using BeautifulSoup for more robust parsing.

Finally, a brief introduction to regular expressions is provided, emphasizing that beginners need not memorize every pattern—understanding when and how to use common constructs like [\s\S] is sufficient for effective web scraping.

pythonregexweb-scrapingJD.comurllib
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.