5 Practical Pandas Techniques to Remove Duplicate Hours from Excel Data
This article demonstrates five pandas-based methods for extracting unique date‑hour records from an Excel file, explaining how to drop duplicates, reset minutes and seconds, floor timestamps, reformat dates, and use openpyxl for custom extraction.
1. Introduction
The author shares a Python automation question from a community member and presents five practical ways to filter Excel data by unique date‑hour combinations using pandas and openpyxl.
2. Implementation
Method 1: Extract day and hour, then drop duplicates
import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
# Extract day and hour columns
df['day'] = df['SampleTime'].dt.day
df['hour'] = df['SampleTime'].dt.hour
# Remove duplicate day‑hour pairs
df = df.drop_duplicates(subset=['day', 'hour'])
# Save result
df.to_excel('数据筛选结果2.xlsx')Method 2: Replace minutes and seconds with zero
import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
# Set minute and second to 0
SampleTime_new = df['SampleTime'].map(lambda x: x.replace(minute=0, second=0))
data = df[SampleTime_new.duplicated() == False]
df.to_excel('数据筛选结果2.xlsx')Method 3: Floor timestamps to the hour
import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
SampleTime_new = df['SampleTime'].dt.floor(freq='H')
df = df[SampleTime_new.duplicated() == False]
df.to_excel('数据筛选结果2.xlsx')Method 4: Convert timestamps to hourly periods
import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
SampleTime_new = df['SampleTime'].dt.to_period(freq='H')
df = df[SampleTime_new.duplicated() == False]
df.to_excel('数据筛选结果2.xlsx')Method 5: Reformat timestamps and drop duplicates on the new column
import pandas as pd
excel_filename = '数据.xlsx'
df = pd.read_excel(excel_filename)
# Create a new column with formatted date‑hour string
df['new'] = df['SampleTime'].dt.strftime('%Y-%m-%d %H')
# Drop duplicates based on the new column
df = df.drop_duplicates(subset=['new'])
df.to_excel('数据筛选结果2.xlsx')Method 6: Use openpyxl for custom extraction
from openpyxl import load_workbook, Workbook
from datetime import datetime
# Load workbook and active sheet
workbook = load_workbook('数据.xlsx')
sheet = workbook.active
time_column = sheet['C']
row_lst = []
date_lst = []
hour_lst = []
for cell in time_column:
if cell.value != "SampleTime" and cell.value is not None:
if cell.value.date() not in date_lst:
date_lst.append(cell.value.date())
for date in date_lst:
for cell in time_column:
if cell.value != "SampleTime" and cell.value is not None:
if cell.value.date() == date and cell.value.hour not in hour_lst:
hour_lst.append(cell.value.hour)
row_lst.append(cell.row)
hour_lst = []
# Create new workbook and copy header
new_workbook = Workbook()
new_sheet = new_workbook.active
header = sheet[1]
header_lst = [cell.value for cell in header]
new_sheet.append(header_lst)
# Copy selected rows
for row in row_lst:
data_lst = [cell.value for cell in sheet[row]]
new_sheet.append(data_lst)
new_workbook.save('新表.xlsx')
print("满足条件的新表保存完成!")3. Summary
The five pandas approaches share similar ideas—either truncating minutes/seconds or flooring timestamps—to obtain unique hourly records, while the openpyxl method demonstrates a manual row‑by‑row extraction when pandas cannot be used.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
