Backend Development 5 min read

Extract JD.com Product Data with Python BeautifulSoup: A Step‑by‑Step Guide

This tutorial shows how to build a JD.com search URL, fetch the page with urllib, parse the HTML using BeautifulSoup, and reliably extract each product's name, link, image and price while handling missing image URLs and avoiding regex complexity.

Python Crawling & Data Mining

Feb 1, 2026

Extract JD.com Product Data with Python BeautifulSoup: A Step‑by‑Step Guide

Overview

The article demonstrates a practical method for scraping product information from JD.com using Python's standard urllib library to request pages and the BeautifulSoup library to parse the resulting HTML tree.

Step 1: Build the search URL

Enter a keyword (e.g., "狗粮" for dog food) and encode it with urllib.parse.quote. The encoded keyword is appended to the JD search endpoint, producing a URL such as

https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8

Step 2: Fetch the page

Use urllib.request.urlopen (or requests.get if preferred) to send an HTTP GET request to the constructed URL and obtain the raw HTML response.

Step 3: Parse with BeautifulSoup

Pass the HTML content to BeautifulSoup(html, "html.parser"). BeautifulSoup builds a navigable tag tree, allowing easy traversal of nested elements.

Step 4: Locate product items

Inspect the page source to find that each product resides inside a <li data-sku="..." class="gl-item"> element. Use a selector such as soup.select('li.gl-item') to collect all product nodes.

Step 5: Extract details

Product name: item.select_one('.p-name a').get_text(strip=True) Product link: item.select_one('.p-name a')['href'] Image URL: use item.select_one('img').get('src') (or get('data-lazy-img') for lazy‑loaded images)

Price: item.select_one('.p-price i').get_text() Because some images may have an empty src, the guide recommends using img.get('src') which returns None instead of raising an exception, or wrapping the extraction in a try/except block and skipping missing values.

Result

The script prints a list of dictionaries containing the extracted fields, and the final screenshot (shown in the article) displays the successfully scraped dog‑food products with their names, links, images, and prices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data extraction JD.com urllib beautifulsoup web-scraping

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.