Taobao Snack Data Analysis and Web Scraping with Python
This article presents a Python‑based web‑scraping project that collects the first ten pages of "小鱼零食" search results on Taobao, analyzes sales, price distribution, geographic store locations, top comments, and visualizes the findings with bar charts and word clouds.
The author was commissioned to gather market data for a Taobao snack store, requiring extraction of sales, price, and location information from the first ten pages of search results for "小鱼零食". The project also demanded identification of the most commented products and the top ten stores by sales.
After obtaining the data, the author visualized price distribution using a bar chart, showing that most products fall in the 10‑30 CNY range, and mapped store locations, revealing a concentration along coastal and Yangtze River regions. Word‑cloud analysis of user comments highlighted frequent topics such as taste, packaging quality, portion size, and shelf life.
The implementation relies on Python's Selenium library for browser automation, CSV for data storage, and WordCloud for visualizing comment keywords. Key functions include:
import csv
import os
import time
import wordcloud
from selenium import webdriver
from selenium.webdriver.common.by import By
def tongji():
# Process CSV data and count price ranges
...
def get_the_top_10(url):
# Retrieve top‑10 items with price, sales, location, and link
...
def get_top_10_comments(url):
# Extract top‑10 product comments and save to file
...
def get_top_10_comments_wordcloud():
# Generate word cloud from saved comments
...
def get_10_pages_datas():
# Crawl ten pages of search results and write to CSV
...
if __name__ == '__main__':
url = 'https://s.taobao.com/search?q=%E5%B0%8F%E9%B1%BC%E9%9B%B6%E9%A3%9F...'
# get_10_pages_datas()
# tongji()
# get_the_top_10(url)
# get_top_10_comments(url)
get_top_10_comments_wordcloud()The code is modular, allowing users to uncomment the desired function calls. After data collection, the author suggests further exploration, such as price‑based market entry strategies, geographic differentiation, or user‑centric marketing approaches.
Overall, the article serves as a practical guide for building a web scraper, performing basic data analysis, and creating visual insights for e‑commerce market research.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.