Backend Development 13 min read

How to Scrape E‑Commerce Product Data with Selenium and Store It in MySQL

This guide walks you through analyzing a target e‑commerce site, using Selenium to automate searches, extracting product details across multiple pages, handling pagination, parsing HTML with PyQuery, and persisting the collected data into a MySQL database with robust error handling.

MaGe Linux Operations

Dec 19, 2021

How to Scrape E‑Commerce Product Data with Selenium and Store It in MySQL

1. Analyze the target website

Open the e‑commerce homepage, search for "men's clothing", and inspect the network requests to locate the GET URL that returns the product data in the response.

Compare the URL parameters with the query string and verify that the response contains the encrypted product information, which requires Selenium to render the page.

2. Retrieve a single product page

Use Selenium to launch Chrome, navigate to the site, wait for the search box, input the keyword, and click the search button.

# -*- coding: utf-8 -*-
from selenium import webdriver
browser = webdriver.Chrome()

def get_one_page():
    """Fetch a single page"""
    browser.get("https://www.xxxxx.com")
    input = WebDriverWait(browser,10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR,"#q")))
    input.send_keys("男装")
    button = WebDriverWait(browser,10).until(
        EC.element_to_be_clickable((By.CSS_SELECTOR,"#J_TSearchForm > div.search-button > button")))
    button.click()

After the search, wait for the total page count element and extract its text.

3. Retrieve multiple product pages

Enter the desired page number in the pagination input, click the submit button, and verify that the active page indicator matches the requested page.

# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()

def get_next_page(page):
    try:
        input = WebDriverWait(browser,10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > input")))
        input.send_keys(page)
        button = WebDriverWait(browser,10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > span.btn.J_Submit")))
        button.click()
        WebDriverWait(browser,10).until(
            EC.text_to_be_present_in_element((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > ul > li.item.active > span"),str(page)))
    except TimeoutException:
        return get_next_page(page)

4. Extract product information

Wait for the product list to load, retrieve the page source, and parse it with PyQuery. For each product node, collect shop name, location, title, price, sales count, and image URL.

# -*- coding: utf-8 -*-
from pyquery import PyQuery as pq

def get_info():
    """Extract details"""
    WebDriverWait(browser,20).until(
        EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-itemlist .items .item")))
    html = pq(browser.page_source)
    items = html('#mainsrp-itemlist .items .item').items()
    for item in items:
        data = []
        image = item.find('.pic .img').attr('data-src')
        price = item.find('.price').text().strip().replace('
','')
        deal = item.find('.deal-cnt').text()[:-2]
        title = item.find('.title').text().strip()
        shop = item.find('.shop').text().strip()
        location = item.find('.location').text()
        data.append([shop, location, title, price, deal, image])
        print(data)

5. Save data to MySQL

Connect to a local MySQL instance, create a table for the category if it does not exist, and insert each product record. Include error handling to rollback on failure.

# -*- coding: utf-8 -*-
import pymysql

def save_to_mysql(data):
    """Store data in the database"""
    db = pymysql.connect(host="localhost", user="root", password="password", port=3306, db="spiders", charset="utf8")
    cursor = db.cursor()
    cursor.execute("CREATE TABLE IF NOT EXISTS 男装 (shop VARCHAR(20),location VARCHAR(10),title VARCHAR(255),price VARCHAR(20),deal VARCHAR(20),image VARCHAR(255))")
    sql = "INSERT INTO 男装 values(%s,%s,%s,%s,%s,%s)"
    try:
        if cursor.execute(sql, data):
            db.commit()
            print("********已入库**********")
    except:
        print("#########入库失败#########")
        db.rollback()
    db.close()

Full script

The complete program ties together the functions above: it searches for a keyword, determines the total number of result pages, iterates through each page, extracts product details, and saves them to MySQL.

# -*- coding: utf-8 -*-
import re, pymysql
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyquery import PyQuery as pq

browser = webdriver.Chrome()

def get_one_page(name):
    try:
        browser.get("https://www.xxxxx.com")
        input = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#q")))
        input.send_keys(name)
        button = WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#J_TSearchForm > div.search-button > button")))
        button.click()
        pages = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.total")))
        get_info(name)
        return pages.text
    except TimeoutException:
        return get_one_page(name)

def get_next_page(page, name):
    try:
        input = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > input")))
        input.send_keys(page)
        button = WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > span.btn.J_Submit")))
        button.click()
        WebDriverWait(browser,10).until(EC.text_to_be_present_in_element((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > ul > li.item.active > span"),str(page)))
        get_info(name)
    except TimeoutException:
        return get_next_page(page, name)

def get_info(name):
    WebDriverWait(browser,20).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-itemlist .items .item")))
    html = pq(browser.page_source)
    for item in html('#mainsrp-itemlist .items .item').items():
        data = []
        image = item.find('.pic .img').attr('data-src')
        price = item.find('.price').text().strip().replace('
','')
        deal = item.find('.deal-cnt').text()[:-2]
        title = item.find('.title').text().strip()
        shop = item.find('.shop').text().strip()
        location = item.find('.location').text()
        data.append([shop, location, title, price, deal, image])
        for dt in data:
            save_to_mysql(dt, name)

def save_to_mysql(data, name):
    db = pymysql.connect(host="localhost", user="root", password="password", port=3306, db="spiders", charset="utf8")
    cursor = db.cursor()
    cursor.execute("CREATE TABLE IF NOT EXISTS {} (shop VARCHAR(20),location VARCHAR(10),title VARCHAR(255),price VARCHAR(20),deal VARCHAR(20),image VARCHAR(255))".format(name))
    sql = "INSERT INTO {} values(%s,%s,%s,%s,%s,%s)".format(name)
    try:
        if cursor.execute(sql, data):
            db.commit()
            print("********已入库**********")
    except:
        print("#########入库失败#########")
        db.rollback()
    db.close()

def main(name):
    pages = get_one_page(name)
    total = int(re.compile("(\d+)").findall(pages)[0])
    for p in range(1, total+1):
        get_next_page(p, name)

if __name__ == '__main__':
    main("男装")

For more details on Selenium, refer to the official documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Automation Selenium

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.