Fundamentals 8 min read

5 Practical Pandas Techniques to Remove Duplicate Hours from Excel Data

This article demonstrates five pandas-based methods for extracting unique date‑hour records from an Excel file, explaining how to drop duplicates, reset minutes and seconds, floor timestamps, reformat dates, and use openpyxl for custom extraction.

Python Crawling & Data Mining

Jan 22, 2025

5 Practical Pandas Techniques to Remove Duplicate Hours from Excel Data

1. Introduction

The author shares a Python automation question from a community member and presents five practical ways to filter Excel data by unique date‑hour combinations using pandas and openpyxl.

2. Implementation

Method 1: Extract day and hour, then drop duplicates

import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
# Extract day and hour columns
df['day'] = df['SampleTime'].dt.day
df['hour'] = df['SampleTime'].dt.hour
# Remove duplicate day‑hour pairs
df = df.drop_duplicates(subset=['day', 'hour'])
# Save result
df.to_excel('数据筛选结果2.xlsx')

Method 2: Replace minutes and seconds with zero

import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
# Set minute and second to 0
SampleTime_new = df['SampleTime'].map(lambda x: x.replace(minute=0, second=0))
data = df[SampleTime_new.duplicated() == False]
df.to_excel('数据筛选结果2.xlsx')

Method 3: Floor timestamps to the hour

import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
SampleTime_new = df['SampleTime'].dt.floor(freq='H')
df = df[SampleTime_new.duplicated() == False]
df.to_excel('数据筛选结果2.xlsx')

Method 4: Convert timestamps to hourly periods

import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
SampleTime_new = df['SampleTime'].dt.to_period(freq='H')
df = df[SampleTime_new.duplicated() == False]
df.to_excel('数据筛选结果2.xlsx')

Method 5: Reformat timestamps and drop duplicates on the new column

import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
# Create a new column with formatted date‑hour string
df['new'] = df['SampleTime'].dt.strftime('%Y-%m-%d %H')
# Drop duplicates based on the new column
df = df.drop_duplicates(subset=['new'])
df.to_excel('数据筛选结果2.xlsx')

Method 6: Use openpyxl for custom extraction

from openpyxl import load_workbook, Workbook
from datetime import datetime

# Load workbook and active sheet
workbook = load_workbook('数据.xlsx')
sheet = workbook.active
time_column = sheet['C']
row_lst = []
date_lst = []
hour_lst = []
for cell in time_column:
    if cell.value != "SampleTime" and cell.value is not None:
        if cell.value.date() not in date_lst:
            date_lst.append(cell.value.date())

for date in date_lst:
    for cell in time_column:
        if cell.value != "SampleTime" and cell.value is not None:
            if cell.value.date() == date and cell.value.hour not in hour_lst:
                hour_lst.append(cell.value.hour)
                row_lst.append(cell.row)
    hour_lst = []

# Create new workbook and copy header
new_workbook = Workbook()
new_sheet = new_workbook.active
header = sheet[1]
header_lst = [cell.value for cell in header]
new_sheet.append(header_lst)

# Copy selected rows
for row in row_lst:
    data_lst = [cell.value for cell in sheet[row]]
    new_sheet.append(data_lst)

new_workbook.save('新表.xlsx')
print("满足条件的新表保存完成！")

3. Summary

The five pandas approaches share similar ideas—either truncating minutes/seconds or flooring timestamps—to obtain unique hourly records, while the openpyxl method demonstrates a manual row‑by‑row extraction when pandas cannot be used.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data cleaning Excel duplicate removal date-time processing

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.