How to Remove Duplicate Rows by Name in Pandas: A Quick Guide
This article walks through a pandas workflow that merges two Excel sheets and eliminates duplicate rows based on the "Name" column, providing clear code snippets and a concise solution for common data‑cleaning tasks.
Introduction
In a Python community a user asked how to handle a pandas data‑processing task involving merging two Excel sheets and removing duplicate entries based on the “姓名” (Name) column.
Original code:
import pandas as pd
data1 = pd.read_excel('测试Vlookup.xlsx', sheet_name=0)
data2 = pd.read_excel('测试Vlookup.xlsx', sheet_name=1, usecols=[0,1])
a = pd.merge(data1, data2, how='left', on='姓名')
print(a)The suggested solution was to drop duplicate rows in the “姓名” column.
a.drop_duplicates(subset="姓名", keep='first', inplace=True, ignore_index=True)This resolves the original issue. Additional minor questions were omitted.
Conclusion
The article demonstrates a typical pandas workflow: merging Excel files and eliminating duplicate rows by a specific column, providing clear code snippets for readers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
