How to Accurately Scrape JD.com Product Data with BeautifulSoup
This tutorial shows how to use Python's urllib and BeautifulSoup libraries to encode search keywords, request JD.com pages, parse the HTML tree, and reliably extract product names, links, images, and prices, offering a simpler alternative to complex regular‑expression scrapers.
Yesterday the author used Python regular expressions to crawl JD.com product information, but the code was long and cumbersome. Today, BeautifulSoup is demonstrated to achieve precise matching of JD product data.
HTML files consist of tags organized in a tree structure; BeautifulSoup parses, traverses, and maintains this tag tree.
First, visit JD.com and enter a product keyword (e.g., "dog food"). The request URL becomes
https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8, where the keyword parameter represents the search term. After encoding the keyword, send the request, receive the response, and use BeautifulSoup selectors for data extraction.
The relevant product information resides within <li data-sku="*****" class="gl-item"> tags, so we peel the HTML layers like an onion to retrieve the desired fields.
URL encoding converts characters to %xx form, typically using UTF‑8. Python's urllib.parse.quote method handles this encoding.
Using BeautifulSoup, we extract the product name, link, image, and price. The extraction code (shown in the image) demonstrates these steps.
Note that some image URLs may be empty; using img.get('src') avoids errors, returning None when absent, or handling exceptions with try/except. This is a useful BeautifulSoup tip.
The final output shows the extracted product details, confirming that BeautifulSoup provides a simpler and more reliable approach than regular expressions for JD.com data scraping.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
