How to Automate Bulk Data Import, Deduplication, and Export with Python and MySQL

This tutorial demonstrates how to use Python to read dozens of txt files, write millions of records into a MySQL database, clean duplicate entries, and export selected rows to new files, offering a fast, scriptable alternative to manual Excel processing.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Automate Bulk Data Import, Deduplication, and Export with Python and MySQL

Application scenario: handling large volumes of data that need integration, deduplication, and export; using Excel is time‑consuming, so a Python‑MySQL solution is proposed.

The article covers three main tasks: writing data to a database, cleaning duplicate records, and exporting data in a specific format.

Write Data to MySQL Database

The source files are shown below.

Data source files
Data source files

A small program reads each txt file in a folder and inserts its lines into a MySQL table.

Code

import pymysql
import os
conn = pymysql.connect(host='localhost', user='root', password='123456', db='qq', charset='utf8')
cur = conn.cursor()
cur.execute("CREATE TABLE qq ( id int(5) NOT NULL auto_increment, qq varchar(20) NOT NULL, PRIMARY KEY (id));")
conn.commit()
path = os.getcwd()
files = os.listdir(path)
i = 0
for file in files:
    f = open(file,'r',encoding='UTF-8')
    next(f)
    for line in f:
        i += 1
        sql = "insert into qq(qq) values(%s);"
        cur.execute(sql,line)
        print("Inserted", i, "records")
        conn.commit()
    f.close()
cur.close()
conn.close()

Running the script produces the following result:

Execution result
Execution result

Key Code Explanation

pymysql: library for MySQL operations.

os: library for traversing files in a directory.

To package the script as an executable, use pyinstaller and place the exe in the target folder. The commands path = os.getcwd() and files = os.listdir(path) retrieve the current directory and list all files.

Data Cleaning

Example: removing duplicate values.

Step 1 – Create a new table for cleaned data

CREATE TABLE qq_dist (
    id int(5) NOT NULL auto_increment,
    qq varchar(20) NOT NULL,
    PRIMARY KEY (id)
);

Step 2 – Insert distinct records:

INSERT INTO qq_dist (qq) SELECT DISTINCT qq FROM qq;

Export Data in a Specific Format

Example: export rows 101‑200 to a new txt file.

Code

import pymysql
conn = pymysql.connect(host='localhost', user='root', password='123456', db='wxid', charset='utf8')
print("Writing, please wait...")
cur = conn.cursor()
sql = "SELECT wxid FROM wd_dist LIMIT 100,100;"
cur.execute(sql)
conn.commit()
alldata = cur.fetchall()
f = open('data101-200.txt','a')
i = 0
for data in alldata:
    i += 1
    f.write(data[0])
    f.flush()
f.close()
cur.close()
conn.close()
print("Write complete, total {} records!".format(i))

The MySQL LIMIT m,n clause reads n rows starting from offset m+1. The flush() call forces buffered data to be written to the file.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonmysqldata import
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.