Extract JD.com Product Data with Python BeautifulSoup: A Step‑by‑Step Guide

This tutorial shows how to build a JD.com search URL, fetch the page with urllib, parse the HTML using BeautifulSoup, and reliably extract each product's name, link, image and price while handling missing image URLs and avoiding regex complexity.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Extract JD.com Product Data with Python BeautifulSoup: A Step‑by‑Step Guide

Overview

The article demonstrates a practical method for scraping product information from JD.com using Python's standard urllib library to request pages and the BeautifulSoup library to parse the resulting HTML tree.

Step 1: Build the search URL

Enter a keyword (e.g., "狗粮" for dog food) and encode it with urllib.parse.quote. The encoded keyword is appended to the JD search endpoint, producing a URL such as

https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8

.

Step 2: Fetch the page

Use urllib.request.urlopen (or requests.get if preferred) to send an HTTP GET request to the constructed URL and obtain the raw HTML response.

Step 3: Parse with BeautifulSoup

Pass the HTML content to BeautifulSoup(html, "html.parser"). BeautifulSoup builds a navigable tag tree, allowing easy traversal of nested elements.

Step 4: Locate product items

Inspect the page source to find that each product resides inside a <li data-sku="..." class="gl-item"> element. Use a selector such as soup.select('li.gl-item') to collect all product nodes.

Step 5: Extract details

Product name: item.select_one('.p-name a').get_text(strip=True) Product link: item.select_one('.p-name a')['href'] Image URL: use item.select_one('img').get('src') (or get('data-lazy-img') for lazy‑loaded images)

Price: item.select_one('.p-price i').get_text() Because some images may have an empty src, the guide recommends using img.get('src') which returns None instead of raising an exception, or wrapping the extraction in a try/except block and skipping missing values.

Result

The script prints a list of dictionaries containing the extracted fields, and the final screenshot (shown in the article) displays the successfully scraped dog‑food products with their names, links, images, and prices.

Final output screenshot
Final output screenshot
Data ExtractionJD.comurllibbeautifulsoupweb-scraping
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.