Fundamentals 5 min read

Exploring Statistical Functions, Data Deduplication, and Table Transposition with Pandas

This tutorial demonstrates how to use pandas for calculating total and average scores, adding column means, removing duplicate records, and transposing Excel data, providing clear code examples and explanations of key functions such as sum, mean, duplicated, drop_duplicates, and transpose.

Python Programming Learning Circle

May 27, 2020

Exploring Statistical Functions, Data Deduplication, and Table Transposition with Pandas

The article introduces basic statistical functions in pandas using a student exam score sheet, showing how to compute total and average scores for each student.

import pandas as pd
datas = pd.read_excel('students.xlsx', index_col='ID')
temp = datas[['test1','test2','test3']]
datas['total'] = temp.sum(axis=1)
datas['average'] = temp.mean(axis=1)
datas.to_excel('students.xlsx')
print(datas)

It explains the importance of the axis parameter, where axis=1 operates row‑wise and axis=0 operates column‑wise.

To compute the mean of all columns and append it as a new row, the following code is used:

import pandas as pd
import matplotlib.pyplot as plt
datas = pd.read_excel('students.xlsx')
temp = datas[['test1','test2','test3']]
datas['total'] = temp.sum(axis=1)
datas['average'] = temp.mean(axis=1)
col_mean = datas[['test1','test2','test3','total','average']].mean(axis=0)
datas = datas.append(col_mean, ignore_index=True)
datas.to_excel('students.xlsx')
print(datas)

The next section covers data deduplication: identifying duplicate rows with duplicated, filtering them, and removing duplicates using drop_duplicates with inplace=True.

import pandas as pd
datas = pd.read_excel('students.xlsx')
print('源数据:
', datas)
dupe = datas.duplicated(subset='name')
dupe = dupe[dupe == True]
print('重复数据:
', datas.iloc[dupe.index])
datas.drop_duplicates(subset='name', inplace=True)
print('去重后数据:
', datas)
***********************************************************************
源数据:
            name  test1  test2  test3  total    average
0   student_001     88     85     91    264  88.000000
... (output truncated for brevity) ...
去重后数据:
           name  test1  test2  test3  total    average
0  student_001     88     85     91    264  88.000000
... (output truncated for brevity) ...

Finally, the article shows how to rotate (transpose) a data table, converting rows to columns and vice versa:

import pandas as pd
datas = pd.read_excel('《后浪》弹幕的数据.xlsx')
table = datas.transpose()
table.to_excel('《后浪》弹幕的数据.xlsx')

The author concludes that pandas offers many convenient functions, and mastering them makes working with Excel data effortless.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics data analysis Deduplication Excel Pandas transposition

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.