Fundamentals 6 min read

How to Append Data to a Text File Without Introducing Duplicates in Python

This guide explains how to add new rows to a text file—creating the file if it doesn't exist—while ensuring both the incoming data and the combined file contain no duplicate entries, using Python's set operations and appropriate file modes.

Python Crawling & Data Mining

Jun 9, 2022

How to Append Data to a Text File Without Introducing Duplicates in Python

S Series – Adding Data to a Text File Without Duplicates

The article demonstrates how to write a column of data to a text file (creating the file when necessary) and guarantee that the file contains no duplicate values.

Purpose

Add a column of data to a text file (which may not exist) and ensure the file’s contents are unique.

Method – When the File Does Not Exist

data = ['243','122','782','577','478','334','334','738','122','112','634']

The list contains duplicate strings '122' and '334'. Remove duplicates before saving:

data2 = list(set(data))

If the original order must be kept, sort by the original index:

data2.sort(key=data.index)

Write the deduplicated data to a new file using write mode and UTF‑8 encoding:

with open('test.txt', 'w', encoding='utf-8') as f:
    f.write('
'.join(data2) + '
')

Method – When the File Already Exists

Assume test.txt already contains the previous data. New data to be added:

new = ['243','122','989','989','577','159']

Two problems must be solved:

Remove duplicates within the new data.

Remove values that already exist in the file.

Only one '989' and '159' need to be appended.

with open('test.txt', encoding='utf-8') as f:
    data_list = []
    r_data = f.readline()
    while r_data.strip():
        data_list.append(r_data.strip())
        r_data = f.readline()
    new2 = list(set(new).difference(set(data_list)))
    new2.sort(key=new.index)
    with open('test.txt', 'a', encoding='utf-8') as f:
        f.write('
'.join(new2) + '
')

The first with open reads the existing file line by line to minimise memory usage and strips newline characters. The second with open opens the file in append mode to write only the new, non‑duplicate entries.

Alternative Using a Single File Handle (a+ Mode)

with open('test.txt', 'a+', encoding='utf-8') as f:
    f.seek(0)  # move cursor to the beginning
    data_list = []
    r_data = f.readline()
    while r_data.strip():
        data_list.append(r_data.strip())
        r_data = f.readline()
    new2 = list(set(new).difference(set(data_list)))
    new2.sort(key=new.index)
    f.write('
'.join(new2) + '
')

This approach reduces the number of file openings to one. After seeking to the start, it reads existing lines, computes the difference, and appends the unique new values.

Summary

The article shows step‑by‑step how to add data to an existing text file without creating duplicate entries, handling both the case where the file does not exist and the case where it does, and finally recommending the a+ mode for a concise implementation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Deduplication append file-io data-processing text-file

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.