How to Precisely Scrape JD.com Product Data Using BeautifulSoup CSS Selectors
Learn step‑by‑step how to use Python’s urllib and BeautifulSoup CSS selectors to fetch JD.com product details—such as name, link, image, and price—by constructing the search URL, parsing the HTML source, and extracting the desired information with concise code examples.
Overview
In this tutorial we demonstrate how to precisely scrape product information from JD.com using Python’s urllib for URL handling and BeautifulSoup’s CSS selector support.
Preparing the search URL
Enter a keyword (e.g., “dog food”) on JD.com, which generates a URL such as
https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8. The keyword parameter must be URL‑encoded; Python’s urllib.parse.quote can perform this step.
Fetching the page
Send an HTTP GET request to the encoded URL and obtain the HTML response.
Extracting data with CSS selectors
Inspect the page source (see image) and locate the elements that contain the product name, link, image, and price. Using BeautifulSoup’s select() method with appropriate CSS selectors retrieves these elements as a list.
Typical selectors are: tag.select("div.p-name a") – product name and link tag.select("img.p-img") – product image tag.select("span.p-price") – product price
Iterate over the resulting list to collect the desired fields.
Complete example
The following code (illustrated in the image) shows the full workflow: encode the keyword, request the page, parse with BeautifulSoup, apply CSS selectors, and print the extracted information.
Result
The final output displays a list of JD.com products matching the keyword, including their names, URLs, images, and prices, as shown in the screenshot.
Key takeaways
BeautifulSoup fully supports most CSS selectors, making it a powerful tool for web‑scraping tasks.
The .select() method returns a Python list of matching elements.
When selecting by class use a leading dot (e.g., .class-name); for IDs use a hash ( #id-name).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
