How to Precisely Scrape JD.com Product Data Using BeautifulSoup CSS Selectors

Learn step‑by‑step how to use Python’s urllib and BeautifulSoup CSS selectors to fetch JD.com product details—such as name, link, image, and price—by constructing the search URL, parsing the HTML source, and extracting the desired information with concise code examples.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Precisely Scrape JD.com Product Data Using BeautifulSoup CSS Selectors

Overview

In this tutorial we demonstrate how to precisely scrape product information from JD.com using Python’s urllib for URL handling and BeautifulSoup’s CSS selector support.

Preparing the search URL

Enter a keyword (e.g., “dog food”) on JD.com, which generates a URL such as

https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8

. The keyword parameter must be URL‑encoded; Python’s urllib.parse.quote can perform this step.

Fetching the page

Send an HTTP GET request to the encoded URL and obtain the HTML response.

Extracting data with CSS selectors

Inspect the page source (see image) and locate the elements that contain the product name, link, image, and price. Using BeautifulSoup’s select() method with appropriate CSS selectors retrieves these elements as a list.

Typical selectors are: tag.select("div.p-name a") – product name and link tag.select("img.p-img") – product image tag.select("span.p-price") – product price

Iterate over the resulting list to collect the desired fields.

Complete example

The following code (illustrated in the image) shows the full workflow: encode the keyword, request the page, parse with BeautifulSoup, apply CSS selectors, and print the extracted information.

Result

The final output displays a list of JD.com products matching the keyword, including their names, URLs, images, and prices, as shown in the screenshot.

Key takeaways

BeautifulSoup fully supports most CSS selectors, making it a powerful tool for web‑scraping tasks.

The .select() method returns a Python list of matching elements.

When selecting by class use a leading dot (e.g., .class-name); for IDs use a hash ( #id-name).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonJD.combeautifulsoupcss selector
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.