How to Automate Bulk Data Import, Deduplication, and Export with Python and MySQL
This tutorial demonstrates how to use Python to read dozens of txt files, write millions of records into a MySQL database, clean duplicate entries, and export selected rows to new files, offering a fast, scriptable alternative to manual Excel processing.
Application scenario: handling large volumes of data that need integration, deduplication, and export; using Excel is time‑consuming, so a Python‑MySQL solution is proposed.
The article covers three main tasks: writing data to a database, cleaning duplicate records, and exporting data in a specific format.
Write Data to MySQL Database
The source files are shown below.
A small program reads each txt file in a folder and inserts its lines into a MySQL table.
Code
import pymysql
import os
conn = pymysql.connect(host='localhost', user='root', password='123456', db='qq', charset='utf8')
cur = conn.cursor()
cur.execute("CREATE TABLE qq ( id int(5) NOT NULL auto_increment, qq varchar(20) NOT NULL, PRIMARY KEY (id));")
conn.commit()
path = os.getcwd()
files = os.listdir(path)
i = 0
for file in files:
f = open(file,'r',encoding='UTF-8')
next(f)
for line in f:
i += 1
sql = "insert into qq(qq) values(%s);"
cur.execute(sql,line)
print("Inserted", i, "records")
conn.commit()
f.close()
cur.close()
conn.close()Running the script produces the following result:
Key Code Explanation
pymysql: library for MySQL operations.
os: library for traversing files in a directory.
To package the script as an executable, use pyinstaller and place the exe in the target folder. The commands path = os.getcwd() and files = os.listdir(path) retrieve the current directory and list all files.
Data Cleaning
Example: removing duplicate values.
Step 1 – Create a new table for cleaned data
CREATE TABLE qq_dist (
id int(5) NOT NULL auto_increment,
qq varchar(20) NOT NULL,
PRIMARY KEY (id)
);Step 2 – Insert distinct records:
INSERT INTO qq_dist (qq) SELECT DISTINCT qq FROM qq;Export Data in a Specific Format
Example: export rows 101‑200 to a new txt file.
Code
import pymysql
conn = pymysql.connect(host='localhost', user='root', password='123456', db='wxid', charset='utf8')
print("Writing, please wait...")
cur = conn.cursor()
sql = "SELECT wxid FROM wd_dist LIMIT 100,100;"
cur.execute(sql)
conn.commit()
alldata = cur.fetchall()
f = open('data101-200.txt','a')
i = 0
for data in alldata:
i += 1
f.write(data[0])
f.flush()
f.close()
cur.close()
conn.close()
print("Write complete, total {} records!".format(i))The MySQL LIMIT m,n clause reads n rows starting from offset m+1. The flush() call forces buffered data to be written to the file.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
