Backend Development 4 min read

How to Accurately Scrape JD.com Product Data with BeautifulSoup

This tutorial shows how to use Python's urllib and BeautifulSoup libraries to encode search keywords, request JD.com pages, parse the HTML tree, and reliably extract product names, links, images, and prices, offering a simpler alternative to complex regular‑expression scrapers.

Python Crawling & Data Mining

Jan 18, 2018

How to Accurately Scrape JD.com Product Data with BeautifulSoup

Yesterday the author used Python regular expressions to crawl JD.com product information, but the code was long and cumbersome. Today, BeautifulSoup is demonstrated to achieve precise matching of JD product data.

HTML files consist of tags organized in a tree structure; BeautifulSoup parses, traverses, and maintains this tag tree.

First, visit JD.com and enter a product keyword (e.g., "dog food"). The request URL becomes

https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8

, where the keyword parameter represents the search term. After encoding the keyword, send the request, receive the response, and use BeautifulSoup selectors for data extraction.

The relevant product information resides within <li data-sku="*****" class="gl-item"> tags, so we peel the HTML layers like an onion to retrieve the desired fields.

URL encoding converts characters to %xx form, typically using UTF‑8. Python's urllib.parse.quote method handles this encoding.

Using BeautifulSoup, we extract the product name, link, image, and price. The extraction code (shown in the image) demonstrates these steps.

Note that some image URLs may be empty; using img.get('src') avoids errors, returning None when absent, or handling exceptions with try/except. This is a useful BeautifulSoup tip.

The final output shows the extracted product details, confirming that BeautifulSoup provides a simpler and more reliable approach than regular expressions for JD.com data scraping.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

html-parsing Python Web Scraping JD.com beautifulsoup

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.