Exploring Statistical Functions, Data Deduplication, and Table Transposition with Pandas
This tutorial demonstrates how to use pandas for calculating total and average scores, adding column means, removing duplicate records, and transposing Excel data, providing clear code examples and explanations of key functions such as sum, mean, duplicated, drop_duplicates, and transpose.
The article introduces basic statistical functions in pandas using a student exam score sheet, showing how to compute total and average scores for each student.
import pandas as pd
datas = pd.read_excel('students.xlsx', index_col='ID')
temp = datas[['test1','test2','test3']]
datas['total'] = temp.sum(axis=1)
datas['average'] = temp.mean(axis=1)
datas.to_excel('students.xlsx')
print(datas)It explains the importance of the axis parameter, where axis=1 operates row‑wise and axis=0 operates column‑wise.
To compute the mean of all columns and append it as a new row, the following code is used:
import pandas as pd
import matplotlib.pyplot as plt
datas = pd.read_excel('students.xlsx')
temp = datas[['test1','test2','test3']]
datas['total'] = temp.sum(axis=1)
datas['average'] = temp.mean(axis=1)
col_mean = datas[['test1','test2','test3','total','average']].mean(axis=0)
datas = datas.append(col_mean, ignore_index=True)
datas.to_excel('students.xlsx')
print(datas)The next section covers data deduplication: identifying duplicate rows with duplicated, filtering them, and removing duplicates using drop_duplicates with inplace=True.
import pandas as pd
datas = pd.read_excel('students.xlsx')
print('源数据:
', datas)
dupe = datas.duplicated(subset='name')
dupe = dupe[dupe == True]
print('重复数据:
', datas.iloc[dupe.index])
datas.drop_duplicates(subset='name', inplace=True)
print('去重后数据:
', datas)
***********************************************************************
源数据:
name test1 test2 test3 total average
0 student_001 88 85 91 264 88.000000
... (output truncated for brevity) ...
去重后数据:
name test1 test2 test3 total average
0 student_001 88 85 91 264 88.000000
... (output truncated for brevity) ...Finally, the article shows how to rotate (transpose) a data table, converting rows to columns and vice versa:
import pandas as pd
datas = pd.read_excel('《后浪》弹幕的数据.xlsx')
table = datas.transpose()
table.to_excel('《后浪》弹幕的数据.xlsx')The author concludes that pandas offers many convenient functions, and mastering them makes working with Excel data effortless.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
