How to Scrape Dangdang Bestselling Books with Selenium and Python

This tutorial walks you through installing Selenium and ChromeDriver, configuring the environment, and using Python code to automatically navigate Dangdang's bestseller pages, extract book details with pyquery, and save the results into a CSV file for further analysis.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape Dangdang Bestselling Books with Selenium and Python

Introduction

In the previous article we crawled a news site; this article demonstrates using Selenium to scrape Dangdang's bestseller list, showing how to automate browser actions and extract book information.

Preparation

Install the Selenium library and ensure Chrome and ChromeDriver versions match.

pip install selenium
pip install your_driver.whl

ChromeDriver Installation

Check Chrome version via Help → About, download the corresponding ChromeDriver from the official site, and place the executable in a directory that is on the system PATH (e.g., Python's Scripts folder).

ChromeDriver download page
ChromeDriver download page

Scraping Process

Use Selenium to open each bestseller page, retrieve the page source, and parse it with pyquery to extract rank, title, image URL, price, comments, and other metadata.

browser = webdriver.Chrome()
wait = WebDriverWait(browser, 10)

def index_page(page):
    print('正在爬取第', page, '页')
    try:
        url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-' + str(page)
        browser.get(url)
        get_booklist()
    except TimeoutException:
        index_page(page)
def get_booklist():
    html = browser.page_source
    doc = pq(html)
    items = doc('.bang_list li').items()
    for item in items:
        book = {
            '排名': item.find('.list_num').text(),
            '书名': item.find('.name').text(),
            '图片': item.find('.pic img').attr('src'),
            '评论数': item.find('.star a').text(),
            '推荐': item.find('.tuijian').text(),
            '作者': item.find('.publisher_info a').text(),
            '日期': item.find('.publisher_info span').text(),
            '原价': item.find('.price_r').text().replace('¥', ''),
            '折扣': item.find('.price_s').text(),
            '电子书': item.find('.price_e').text().replace('电子书:', '').replace('¥', '')
        }
        saving_book(book)
with open('data.csv', 'a', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['排名','书名','图片','评论数','推荐','作者','原价','折扣','电子书'])
def saving_book(book):
    with open('data.csv', 'a', newline='') as csfile:
        writer = csv.writer(csfile)
        writer.writerow([
            book.get('排名'),
            book.get('书名'),
            book.get('图片'),
            book.get('评论数'),
            book.get('推荐'),
            book.get('作者'),
            book.get('原价'),
            book.get('折扣'),
            book.get('电子书')
        ])

Iterating Pages

Loop over the desired page numbers (example shows pages 1‑2) and call index_page for each.

if __name__ == '__main__':
    for page in range(1, 3):
        index_page(page)

Result

The script writes each book's details to data.csv, which can be opened in spreadsheet software for further analysis.

CSV result preview
CSV result preview
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAutomationData ExtractionCSVWeb ScrapingSelenium
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.