Build a Simple Python Taobao Scraper with GUI, Excel Export, and Logging

This tutorial walks through creating a Python program that scrapes product links, names, and images from a Taobao store, logs the process, exports the data to Excel, and provides a user‑friendly Tkinter GUI for searching and controlling the scraper.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Build a Simple Python Taobao Scraper with GUI, Excel Export, and Logging

Introduction

With the rise of online shopping, many traditional stores are moving to e‑commerce, making data extraction from platforms like Taobao valuable.

Project Goal

The goal is to develop a Python script that automatically searches Taobao, retrieves product URLs, names, and image links, and records each operation in a log file.

Project Preparation

The program is written using Sublime Text 3. Below is the main interface after running the script:

Implementation

1. Analyze the page structure and collect product information into separate lists. Example store page is shown below:

2. Use the browser's developer tools (F12) to locate the product links, focusing on the "recommended items" section where most products appear.

3. The product details are inside dt tags with class photo. The following code extracts the links, names, and image URLs:

try:
    urllib3.disable_warnings()  # suppress warnings
    # Web request
    rep = requests.get(self.e2.get(), verify=False, timeout=4)  # disable cert verification, set timeout
    rep.encoding = 'gbk'
    soup = BeautifulSoup(rep.content, 'html.parser')
    result = soup.find_all('dt', class_='photo')  # get all dt elements with class photo
    for x in result:
        tt = x.find_all('a')  # get all <a> children
        for y in tt:
            for x in y:
                ab = x.find_next_siblings('img')  # get sibling <img> elements
                for z in ab:
                    # add product name and image link to lists aa and bb
                    aa.append(z['alt'])
                    bb.append('https:' + z['data-ks-lazyload'])
                cc.append('https:' + y['href'])  # add product link to list cc
except:
    return

This extracts product links, names, and image URLs into the lists aa, bb, and cc.

GUI Development

A user‑friendly interface is built with Tkinter. The class below creates the window, input fields, buttons, and a text area for output:

class page:
    def __init__(self):
        self.ti = dt.now().strftime("%Y/%m/%d %H:%S:%M")
        self.root = tk.Tk()  # initialize window
        self.root.title('淘宝获取商家宝贝V1.0')
        self.root.geometry('700x700')
        self.root.iconbitmap('q.ico')
        self.root.resizable(width=True, height=True)
        self.label1 = tk.Label(self.root, text='店铺首页:', font=('宋体', 10), width=12, height=2)
        self.e2 = tk.Entry(self.root, width=30, font=('Arial', 12))
        self.label2 = tk.Label(self.root, text='淘宝直达:', font=('宋体', 10), width=12, height=2)
        self.e1 = tk.Entry(self.root, width=30, font=('Arial', 12))
        self.b1 = tk.Button(self.root, text='解析页面', width=8, height=1, command=self.parse)
        self.b2 = tk.Button(self.root, text='生成excel', width=8, height=1, command=self.sc)
        self.b3 = tk.Button(self.root, text='淘宝搜索', width=8, height=1, command=self.search)
        self.b4 = tk.Button(self.root, text='关闭程序', width=8, height=1, command=self.close)
        self.b5 = tk.Button(self.root, text='保存日志', width=8, height=1, command=self.log)
        self.te = tk.Text(self.root, height=40)
        # place widgets
        self.label1.place(x=140, y=30, anchor='nw')
        self.label2.place(x=138, y=70, anchor='nw')
        self.e1.place(x=210, y=74, anchor='nw')
        self.e2.place(x=210, y=34, anchor='nw')
        self.b1.place(x=160, y=110, anchor='nw')
        self.b2.place(x=240, y=110, anchor='nw')
        self.b3.place(x=320, y=110, anchor='nw')
        self.b4.place(x=400, y=110, anchor='nw')
        self.b5.place(x=480, y=110, anchor='nw')
        self.te.place(x=40, y=170, anchor='nw')
        self.e1.delete(0, "end")
        self.e1.insert(0, "请输入要搜索的商品")
        self.root.mainloop()

The resulting GUI looks like this:

Parsing and Export

The parse method validates the input URL, extracts data, and displays it in the text area. The sc method creates a pandas DataFrame and saves it to 22.xlsx:

# Save results to Excel
def sc(self):
    self.te.insert("insert", "...开始生成...
")
    av = {'时间': self.ti, '商品名称': aa, '商品链接': cc, '商品图片链接': bb}
    df = p.DataFrame(av, columns=['时间', '商品名称', '商品链接', '商品图片链接'], index=range(len(aa)))
    df.to_excel('22.xlsx', sheet_name='taobao')
    self.te.insert("end", "...生成完成...
")

After running, the Excel file contains the scraped data.

Logging

The log method writes the text area content with timestamps to 1.txt:

# Save log
def log(self):
    ss = str(self.te.get(0.0, 'end')).split('
')
    with open('1.txt', 'w', encoding='utf8') as f:
        for y in range(len(ss)):
            rea = str(self.ti) + ss[y] + '
'
            f.write(rea)

Search and Close Functions

The search method opens the default browser to Taobao's search page with the entered keyword, and close destroys the Tkinter window:

# Search product
def search(self):
    self.te.insert("insert", "...打开浏览器...
")
    wb.open('https://s.taobao.com/search?q=' + self.e1.get())

# Close program
def close(self):
    self.te.insert("insert", "...关闭程序...
")
    self.root.destroy()

Conclusion

1. Avoid scraping excessive data to prevent server overload.

2. This project demonstrates a simple yet functional Python web scraper for Taobao, complete with GUI, Excel export, and logging.

3. While straightforward, the task poses challenges for beginners due to dynamic page structures and potential exceptions, making it a valuable practice project.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GUIPythonTaobaologgingExcelTkinter
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.