Fundamentals 9 min read

Python Techniques for Removing Duplicate Rows, Blank Cells, and Formatting Data in Excel

This article demonstrates how to use Python (with pandas and openpyxl) to efficiently clean Excel worksheets by removing duplicate and empty rows, stripping spaces, reformatting dates, and formatting numeric values with proper precision and currency symbols.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Techniques for Removing Duplicate Rows, Blank Cells, and Formatting Data in Excel

Many developers prefer Python and pandas over Excel for fast table processing and visualization, and this guide shows practical code snippets for common data‑cleaning tasks.

Removing duplicate rows and empty rows – By converting the data to a dictionary with dict.fromkeys(rows) (which preserves order in Python 3) and then back to a list, duplicate rows are eliminated and empty rows can be filtered out.

In [135]:
for row in rows4:
    print(row)
# ... output showing duplicate rows ...

In [148]:
dict.fromkeys(rows4)
Out[148]: {("name", "address"): None, ("tom li", "beijing"): None, ...}

In [137]:
list(dict.fromkeys(rows4))
Out[137]: [("name", "address"), ("tom li", "beijing"), ...]

The resulting logic is wrapped in a function that reads all rows from a worksheet, removes duplicates and empty rows, clears the sheet, and writes the cleaned data back.

def handle_duplicate(wb, sheetname):
    """Remove duplicate rows and empty rows"""
    print(f'开始处理工作表:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    old_rows = [x for x in sheet.values]
    new_rows = list(x for x in dict.fromkeys(old_rows) if len(x[0]) > 1)
    sheet.delete_rows(1, sheet.max_row)
    for row in new_rows:
        sheet.append(row)

Removing spaces – Using split() and ''.join() removes all whitespace inside a string.

a = "a b c   "
a.strip()          # 'a b c'
a.split()          # ['a', 'b', 'c']
''.join(a.split()) # 'abc'

This logic is encapsulated in handle_blank, which iterates over each cell in every column, replaces the cell value with the space‑free version, and prints before/after values.

def handle_blank(wb, sheetname):
    """Remove spaces column‑wise"""
    sheet = wb[sheetname]
    for col in sheet.iter_cols():
        for cell in col:
            cell.value = ''.join(cell.value.split())

Modifying date and time formats – By splitting a date string (e.g., "1/11") and constructing a datetime object, the date can be reformatted with strftime using patterns like %Y-%m-%d.

import datetime
m, d = cell.value.split('/')
day = datetime.datetime(2019, int(m), int(d))
cell.value = day.strftime("%Y-%m-%d")

The above is placed in handle_time, which processes the first column of a sheet starting from the second row.

def handle_time(wb, sheetname):
    """Reformat date strings to YYYY‑MM‑DD"""
    sheet = wb[sheetname]
    for col in sheet.iter_cols(max_col=1, min_row=2):
        for cell in col:
            m, d = cell.value.split('/')
            day = datetime.datetime(2019, int(m), int(d))
            cell.value = day.strftime("%Y-%m-%d")

Formatting numbers and adding currency symbols – Using Decimal ensures two‑decimal precision and automatic zero‑padding, while concatenating the Chinese yuan symbol produces a formatted monetary string.

from decimal import Decimal
a = 3.1
b = Decimal(a).quantize(Decimal("0.00"))   # 3.10

cell.value = '¥' + str(Decimal(cell.value).quantize(Decimal("0.00")))

This logic is implemented in handle_num, which iterates over the target column(s) and updates each cell accordingly.

def handle_num(wb, sheetname):
    """Format numeric cells with two decimals and a currency prefix"""
    sheet = wb[sheetname]
    for col in sheet.iter_cols(min_col=3, max_col=3, min_row=2):
        for cell in col:
            cell.value = '¥' + str(Decimal(cell.value).quantize(Decimal("0.00")))

Each function is demonstrated with loading a workbook, calling the handler, and saving the result, illustrating a complete workflow for automating Excel data cleaning with Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Excelopenpyxl
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.