Extract JD.com Product Data with Python BeautifulSoup: A Step‑by‑Step Guide
This tutorial shows how to build a JD.com search URL, fetch the page with urllib, parse the HTML using BeautifulSoup, and reliably extract each product's name, link, image and price while handling missing image URLs and avoiding regex complexity.
Overview
The article demonstrates a practical method for scraping product information from JD.com using Python's standard urllib library to request pages and the BeautifulSoup library to parse the resulting HTML tree.
Step 1: Build the search URL
Enter a keyword (e.g., "狗粮" for dog food) and encode it with urllib.parse.quote. The encoded keyword is appended to the JD search endpoint, producing a URL such as
https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8.
Step 2: Fetch the page
Use urllib.request.urlopen (or requests.get if preferred) to send an HTTP GET request to the constructed URL and obtain the raw HTML response.
Step 3: Parse with BeautifulSoup
Pass the HTML content to BeautifulSoup(html, "html.parser"). BeautifulSoup builds a navigable tag tree, allowing easy traversal of nested elements.
Step 4: Locate product items
Inspect the page source to find that each product resides inside a <li data-sku="..." class="gl-item"> element. Use a selector such as soup.select('li.gl-item') to collect all product nodes.
Step 5: Extract details
Product name: item.select_one('.p-name a').get_text(strip=True) Product link: item.select_one('.p-name a')['href'] Image URL: use item.select_one('img').get('src') (or get('data-lazy-img') for lazy‑loaded images)
Price: item.select_one('.p-price i').get_text() Because some images may have an empty src, the guide recommends using img.get('src') which returns None instead of raising an exception, or wrapping the extraction in a try/except block and skipping missing values.
Result
The script prints a list of dictionaries containing the extracted fields, and the final screenshot (shown in the article) displays the successfully scraped dog‑food products with their names, links, images, and prices.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
