How to Scrape and Analyze Taobao Snack Sales Data with Python
This article walks through a real‑world Python project that uses Selenium to crawl the first ten pages of Taobao snack listings, extracts sales, price and location data, visualizes price distribution and geographic concentration, generates a word‑cloud of top user comments, and lists the top‑selling stores, providing full source code for replication.
Preface
The author was hired by a client who wants to open a Taobao store for "Little Fish Snacks" and needed market analysis of existing products. Manual statistics are possible but cumbersome, so the author was asked to automate the process.
1. Project Requirements
Search "小鱼零食" on Taobao and collect sales and revenue for all items in the first 10 pages, then count items within predefined price ranges.
Identify the geographic distribution of the sellers across the country.
Find the most frequently commented product among the results.
Extract the names and links of the top‑selling 10 stores.
2. Result Preview
After gathering the data, the author performed analysis and produced a bar chart (mouse‑over shows exact item counts). The chart shows that most products are priced between 10–30 CNY, indicating a low‑end market positioning.
The geographic analysis reveals that sellers are concentrated along the coast and the middle‑lower Yangtze River region, with the coastal area being the densest.
Comment analysis shows that the most frequent words are related to taste, packaging quality, portion size, and shelf life. These insights can guide product packaging and marketing.
The final section lists the top‑selling 10 stores with their links.
The author reflects that, with the data, one could explore price‑based entry points, geographic differentiation, or user‑centric marketing strategies, though they admit to being a non‑expert in snack products.
3. Scraper Source Code
The full Python source code is provided below. It uses selenium for web navigation, csv for data storage, and wordcloud for visualizing comment keywords. The code is split into several functions:
import csv
import os
import time
import wordcloud
from selenium import webdriver
from selenium.webdriver.common.by import By
def tongji():
prices = []
with open('前十页销量和金额.csv', 'r', encoding='utf-8', newline='') as f:
fieldnames = ['价格', '销量', '店铺位置']
reader = csv.DictReader(f, fieldnames=fieldnames)
for index, i in enumerate(reader):
if index != 0:
price = float(i['价格'].replace('¥', ''))
prices.append(price)
DATAS = {'<10': 0, '10~30': 0, '30~50': 0,
'50~70': 0, '70~90': 0, '90~110': 0,
'110~130': 0, '130~150': 0, '150~170': 0, '170~200': 0, }
for price in prices:
if price < 10:
DATAS['<10'] += 1
elif 10 <= price < 30:
DATAS['10~30'] += 1
elif 30 <= price < 50:
DATAS['30~50'] += 1
elif 50 <= price < 70:
DATAS['50~70'] += 1
elif 70 <= price < 90:
DATAS['70~90'] += 1
elif 90 <= price < 110:
DATAS['90~110'] += 1
elif 110 <= price < 130:
DATAS['110~130'] += 1
elif 130 <= price < 150:
DATAS['130~150'] += 1
elif 150 <= price < 170:
DATAS['150~170'] += 1
elif 170 <= price < 200:
DATAS['170~200'] += 1
for k, v in DATAS.items():
print(k, ':', v)
# ... (other functions: get_the_top_10, get_top_10_comments, get_top_10_comments_wordcloud, get_10_pages_datas)
if __name__ == '__main__':
url = 'https://s.taobao.com/search?q=%E5%B0%8F%E9%B1%BC%E9%9B%B6%E9%A3%9F&...'
# get_10_pages_datas()
# tongji()
# get_the_top_10(url)
# get_top_10_comments(url)
get_top_10_comments_wordcloud()Using the above code, one can retrieve the required data and then visualize it with bar charts and geographic maps.
Through this project, the author demonstrates a practical workflow for market analysis on e‑commerce platforms using Python.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
