Build a Simple Python Taobao Scraper with GUI, Excel Export, and Logging
This tutorial walks through creating a Python program that scrapes product links, names, and images from a Taobao store, logs the process, exports the data to Excel, and provides a user‑friendly Tkinter GUI for searching and controlling the scraper.
Introduction
With the rise of online shopping, many traditional stores are moving to e‑commerce, making data extraction from platforms like Taobao valuable.
Project Goal
The goal is to develop a Python script that automatically searches Taobao, retrieves product URLs, names, and image links, and records each operation in a log file.
Project Preparation
The program is written using Sublime Text 3. Below is the main interface after running the script:
Implementation
1. Analyze the page structure and collect product information into separate lists. Example store page is shown below:
2. Use the browser's developer tools (F12) to locate the product links, focusing on the "recommended items" section where most products appear.
3. The product details are inside dt tags with class photo. The following code extracts the links, names, and image URLs:
try:
urllib3.disable_warnings() # suppress warnings
# Web request
rep = requests.get(self.e2.get(), verify=False, timeout=4) # disable cert verification, set timeout
rep.encoding = 'gbk'
soup = BeautifulSoup(rep.content, 'html.parser')
result = soup.find_all('dt', class_='photo') # get all dt elements with class photo
for x in result:
tt = x.find_all('a') # get all <a> children
for y in tt:
for x in y:
ab = x.find_next_siblings('img') # get sibling <img> elements
for z in ab:
# add product name and image link to lists aa and bb
aa.append(z['alt'])
bb.append('https:' + z['data-ks-lazyload'])
cc.append('https:' + y['href']) # add product link to list cc
except:
returnThis extracts product links, names, and image URLs into the lists aa, bb, and cc.
GUI Development
A user‑friendly interface is built with Tkinter. The class below creates the window, input fields, buttons, and a text area for output:
class page:
def __init__(self):
self.ti = dt.now().strftime("%Y/%m/%d %H:%S:%M")
self.root = tk.Tk() # initialize window
self.root.title('淘宝获取商家宝贝V1.0')
self.root.geometry('700x700')
self.root.iconbitmap('q.ico')
self.root.resizable(width=True, height=True)
self.label1 = tk.Label(self.root, text='店铺首页:', font=('宋体', 10), width=12, height=2)
self.e2 = tk.Entry(self.root, width=30, font=('Arial', 12))
self.label2 = tk.Label(self.root, text='淘宝直达:', font=('宋体', 10), width=12, height=2)
self.e1 = tk.Entry(self.root, width=30, font=('Arial', 12))
self.b1 = tk.Button(self.root, text='解析页面', width=8, height=1, command=self.parse)
self.b2 = tk.Button(self.root, text='生成excel', width=8, height=1, command=self.sc)
self.b3 = tk.Button(self.root, text='淘宝搜索', width=8, height=1, command=self.search)
self.b4 = tk.Button(self.root, text='关闭程序', width=8, height=1, command=self.close)
self.b5 = tk.Button(self.root, text='保存日志', width=8, height=1, command=self.log)
self.te = tk.Text(self.root, height=40)
# place widgets
self.label1.place(x=140, y=30, anchor='nw')
self.label2.place(x=138, y=70, anchor='nw')
self.e1.place(x=210, y=74, anchor='nw')
self.e2.place(x=210, y=34, anchor='nw')
self.b1.place(x=160, y=110, anchor='nw')
self.b2.place(x=240, y=110, anchor='nw')
self.b3.place(x=320, y=110, anchor='nw')
self.b4.place(x=400, y=110, anchor='nw')
self.b5.place(x=480, y=110, anchor='nw')
self.te.place(x=40, y=170, anchor='nw')
self.e1.delete(0, "end")
self.e1.insert(0, "请输入要搜索的商品")
self.root.mainloop()The resulting GUI looks like this:
Parsing and Export
The parse method validates the input URL, extracts data, and displays it in the text area. The sc method creates a pandas DataFrame and saves it to 22.xlsx:
# Save results to Excel
def sc(self):
self.te.insert("insert", "...开始生成...
")
av = {'时间': self.ti, '商品名称': aa, '商品链接': cc, '商品图片链接': bb}
df = p.DataFrame(av, columns=['时间', '商品名称', '商品链接', '商品图片链接'], index=range(len(aa)))
df.to_excel('22.xlsx', sheet_name='taobao')
self.te.insert("end", "...生成完成...
")After running, the Excel file contains the scraped data.
Logging
The log method writes the text area content with timestamps to 1.txt:
# Save log
def log(self):
ss = str(self.te.get(0.0, 'end')).split('
')
with open('1.txt', 'w', encoding='utf8') as f:
for y in range(len(ss)):
rea = str(self.ti) + ss[y] + '
'
f.write(rea)Search and Close Functions
The search method opens the default browser to Taobao's search page with the entered keyword, and close destroys the Tkinter window:
# Search product
def search(self):
self.te.insert("insert", "...打开浏览器...
")
wb.open('https://s.taobao.com/search?q=' + self.e1.get())
# Close program
def close(self):
self.te.insert("insert", "...关闭程序...
")
self.root.destroy()Conclusion
1. Avoid scraping excessive data to prevent server overload.
2. This project demonstrates a simple yet functional Python web scraper for Taobao, complete with GUI, Excel export, and logging.
3. While straightforward, the task poses challenges for beginners due to dynamic page structures and potential exceptions, making it a valuable practice project.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
