How to Scrape E‑Commerce Product Data with Selenium and Store It in MySQL
This guide walks you through analyzing a target e‑commerce site, using Selenium to automate searches, extracting product details across multiple pages, handling pagination, parsing HTML with PyQuery, and persisting the collected data into a MySQL database with robust error handling.
1. Analyze the target website
Open the e‑commerce homepage, search for "men's clothing", and inspect the network requests to locate the GET URL that returns the product data in the response.
Compare the URL parameters with the query string and verify that the response contains the encrypted product information, which requires Selenium to render the page.
2. Retrieve a single product page
Use Selenium to launch Chrome, navigate to the site, wait for the search box, input the keyword, and click the search button.
# -*- coding: utf-8 -*-
from selenium import webdriver
browser = webdriver.Chrome()
def get_one_page():
"""Fetch a single page"""
browser.get("https://www.xxxxx.com")
input = WebDriverWait(browser,10).until(
EC.presence_of_element_located((By.CSS_SELECTOR,"#q")))
input.send_keys("男装")
button = WebDriverWait(browser,10).until(
EC.element_to_be_clickable((By.CSS_SELECTOR,"#J_TSearchForm > div.search-button > button")))
button.click()After the search, wait for the total page count element and extract its text.
3. Retrieve multiple product pages
Enter the desired page number in the pagination input, click the submit button, and verify that the active page indicator matches the requested page.
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()
def get_next_page(page):
try:
input = WebDriverWait(browser,10).until(
EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > input")))
input.send_keys(page)
button = WebDriverWait(browser,10).until(
EC.element_to_be_clickable((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > span.btn.J_Submit")))
button.click()
WebDriverWait(browser,10).until(
EC.text_to_be_present_in_element((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > ul > li.item.active > span"),str(page)))
except TimeoutException:
return get_next_page(page)4. Extract product information
Wait for the product list to load, retrieve the page source, and parse it with PyQuery. For each product node, collect shop name, location, title, price, sales count, and image URL.
# -*- coding: utf-8 -*-
from pyquery import PyQuery as pq
def get_info():
"""Extract details"""
WebDriverWait(browser,20).until(
EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-itemlist .items .item")))
html = pq(browser.page_source)
items = html('#mainsrp-itemlist .items .item').items()
for item in items:
data = []
image = item.find('.pic .img').attr('data-src')
price = item.find('.price').text().strip().replace('
','')
deal = item.find('.deal-cnt').text()[:-2]
title = item.find('.title').text().strip()
shop = item.find('.shop').text().strip()
location = item.find('.location').text()
data.append([shop, location, title, price, deal, image])
print(data)5. Save data to MySQL
Connect to a local MySQL instance, create a table for the category if it does not exist, and insert each product record. Include error handling to rollback on failure.
# -*- coding: utf-8 -*-
import pymysql
def save_to_mysql(data):
"""Store data in the database"""
db = pymysql.connect(host="localhost", user="root", password="password", port=3306, db="spiders", charset="utf8")
cursor = db.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS 男装 (shop VARCHAR(20),location VARCHAR(10),title VARCHAR(255),price VARCHAR(20),deal VARCHAR(20),image VARCHAR(255))")
sql = "INSERT INTO 男装 values(%s,%s,%s,%s,%s,%s)"
try:
if cursor.execute(sql, data):
db.commit()
print("********已入库**********")
except:
print("#########入库失败#########")
db.rollback()
db.close()Full script
The complete program ties together the functions above: it searches for a keyword, determines the total number of result pages, iterates through each page, extracts product details, and saves them to MySQL.
# -*- coding: utf-8 -*-
import re, pymysql
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyquery import PyQuery as pq
browser = webdriver.Chrome()
def get_one_page(name):
try:
browser.get("https://www.xxxxx.com")
input = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#q")))
input.send_keys(name)
button = WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#J_TSearchForm > div.search-button > button")))
button.click()
pages = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.total")))
get_info(name)
return pages.text
except TimeoutException:
return get_one_page(name)
def get_next_page(page, name):
try:
input = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > input")))
input.send_keys(page)
button = WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > div.form > span.btn.J_Submit")))
button.click()
WebDriverWait(browser,10).until(EC.text_to_be_present_in_element((By.CSS_SELECTOR,"#mainsrp-pager > div > div > div > ul > li.item.active > span"),str(page)))
get_info(name)
except TimeoutException:
return get_next_page(page, name)
def get_info(name):
WebDriverWait(browser,20).until(EC.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-itemlist .items .item")))
html = pq(browser.page_source)
for item in html('#mainsrp-itemlist .items .item').items():
data = []
image = item.find('.pic .img').attr('data-src')
price = item.find('.price').text().strip().replace('
','')
deal = item.find('.deal-cnt').text()[:-2]
title = item.find('.title').text().strip()
shop = item.find('.shop').text().strip()
location = item.find('.location').text()
data.append([shop, location, title, price, deal, image])
for dt in data:
save_to_mysql(dt, name)
def save_to_mysql(data, name):
db = pymysql.connect(host="localhost", user="root", password="password", port=3306, db="spiders", charset="utf8")
cursor = db.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS {} (shop VARCHAR(20),location VARCHAR(10),title VARCHAR(255),price VARCHAR(20),deal VARCHAR(20),image VARCHAR(255))".format(name))
sql = "INSERT INTO {} values(%s,%s,%s,%s,%s,%s)".format(name)
try:
if cursor.execute(sql, data):
db.commit()
print("********已入库**********")
except:
print("#########入库失败#########")
db.rollback()
db.close()
def main(name):
pages = get_one_page(name)
total = int(re.compile("(\d+)").findall(pages)[0])
for p in range(1, total+1):
get_next_page(p, name)
if __name__ == '__main__':
main("男装")For more details on Selenium, refer to the official documentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
