Fundamentals 17 min read

40 Essential Python pandas Scripts for Excel Data Processing

This article compiles 40 practical Python pandas scripts covering Excel file reading, sheet selection, date parsing, column manipulation, data filtering, sorting, grouping, merging, pivot tables, visualization, cleaning, and advanced operations, providing clear examples and output for each step to help data analysts efficiently handle tabular data.

Test Development Learning Exchange

Nov 7, 2024

40 Essential Python pandas Scripts for Excel Data Processing

This guide presents a collection of 40 useful Python pandas scripts for handling Excel and tabular data, each accompanied by example code and sample output.

1. Read an Excel file

import pandas as pd
# 读取 Excel 文件
data = pd.read_excel('example.xlsx')
print(data.head())

2. Read a specific sheet

# 读取指定的工作表
data = pd.read_excel('example.xlsx', sheet_name='Sheet1')
print(data.head())

3. Parse date columns

# 读取日期格式
data = pd.read_excel('example.xlsx', parse_dates=['Date'])
print(data.head())

4. Assign column names while reading

# 添加列名
data = pd.read_excel('example.xlsx', names=['A', 'B', 'C'])
print(data.head())

5. Save DataFrame to Excel

# 保存为 Excel 文件
data.to_excel('output.xlsx', index=False)

6. Filter rows

# 筛选数据
filtered_data = data[data['A'] > 10]
print(filtered_data.head())

7. Sort by a column

# 按 A 列排序
sorted_data = data.sort_values(by='A')
print(sorted_data.head())

8. Group by a column

# 按 A 列分组
grouped_data = data.groupby('A')
print(grouped_data.groups)

9. Compute group means

# 计算分组平均值
grouped_mean = grouped_data.mean()
print(grouped_mean.head())

10. Find and replace values

# 查找替换数据
data.replace({'old_value':'new_value'}, inplace=True)
print(data.head())

11. Insert a new column at the front

# 在第一列插入新列
data.insert(0, 'NewColumn', 'default_value')
print(data.head())

12. Delete the first column

# 删除第一列
data.drop(data.columns[0], axis=1, inplace=True)
print(data.head())

13. Rename a column

# 重命名列
data.rename(columns={'A':'NewColumnName'}, inplace=True)
print(data.head())

14. Concatenate two Excel files

# 合并两个 Excel 文件
data1 = pd.read_excel('example1.xlsx')
data2 = pd.read_excel('example2.xlsx')
merged_data = pd.concat([data1, data2], ignore_index=True)
print(merged_data.head())

15. Create a pivot table

# 创建数据透视表
pivot_table = data.pivot_table(index='A', columns='B', values='C')
print(pivot_table.head())

16. Plot the pivot table

import matplotlib.pyplot as plt
# 创建数据透视图
pivot_table = data.pivot_table(index='A', columns='B', values='C')
pivot_table.plot(kind='bar')
plt.show()

17. Clean strings

# 去除空格
data['ColumnName'] = data['ColumnName'].str.strip()
# 去除特殊字符
data['ColumnName'] = data['ColumnName'].str.replace(r'[^a-zA-Z0-9]', '')
print(data.head())

18. Add an Excel formula column

# 使用 Excel 公式
data = pd.read_excel('example.xlsx')
data['NewColumn'] = '=SUM(A2:B2)'
print(data.head())

19. Get unique values of a column

# 获取 A 列唯一值
unique_values = data["A"].unique()
print(unique_values)

20. Drop duplicate rows

# 删除重复行
df = df.drop_duplicates()
print(df.head())

21. Convert column names to lowercase

# 修改列名大小写
df.columns = [col.lower() for col in df.columns]
print(df.head())

22. Reorder columns

# 修改列顺序
df = df[['b','a','c']]
print(df.head())

23. Add a new column derived from others

# 添加新列
df['d'] = df['a'] + df['b']
print(df.head())

24. Delete a specific column

# 删除指定列
df = df.drop('a', axis=1)
print(df.head())

25. Conditional filtering

# 使用条件表达式筛选数据
filtered_data = df[(df['b'] > 10) & (df['c'] < 50)]
print(filtered_data.head())

26. Apply a custom function

# 使用 apply 函数应用自定义函数
def custom_function(x):
    return x * 2

df['b'] = df['b'].apply(custom_function)
print(df.head())

27. Map values

# 使用 map 函数映射值
df['b'] = df['b'].map({40:'forty',42:'forty-two',44:'forty-four',46:'forty-six',48:'forty-eight'})
print(df.head())

28. Fill missing values

# 使用 fillna 函数填充缺失值
df['b'].fillna(value='unknown', inplace=True)
print(df.head())

29. Interpolate missing values

# 使用 interpolate 函数插值填充缺失值
df['b'] = df['b'].interpolate(method='linear')
print(df.head())

30. Merge two DataFrames on a key

# 使用 merge 函数合并两个 DataFrame
df1 = pd.DataFrame({'key':[1,2,3],'value1':[10,20,30]})
df2 = pd.DataFrame({'key':[2,3,4],'value2':[40,50,60]})
merged_data = pd.merge(df1, df2, on='key')
print(merged_data)

31. Concatenate two DataFrames

# 使用 concat 函数合并两个 DataFrame
df1 = pd.DataFrame({'A':[1,2,3],'B':[10,20,30]})
df2 = pd.DataFrame({'A':[4,5,6],'B':[40,50,60]})
concatenated_data = pd.concat([df1, df2], ignore_index=True)
print(concatenated_data)

32. Melt wide to long format

# 使用 melt 函数将宽格式数据转换为长格式数据
df = pd.DataFrame({'A':[1,2,3],'B':[10,20,30],'C':[100,200,300]})
melted_data = pd.melt(df, id_vars=['A'], value_vars=['B','C'])
print(melted_data)

33. Groupby with aggregation

# 使用 groupby 函数分组数据并计算统计信息
grouped_data = df.groupby('A').agg({'B':['mean','sum']})
print('分组统计信息:')
print(grouped_data.head())

34. Create a cross-tabulation

# 使用 crosstab 函数创建交叉表
crosstab_data = pd.crosstab(df['A'], df['B'])
print('交叉表:')
print(crosstab_data)

35. Bin a continuous variable

# 使用 cut 函数将连续变量划分为离散区间
df['A'] = pd.cut(df['A'], bins=[0,5,10,15])
print('划分后的数据:')
print(df.head())

36. Generate descriptive statistics

# 使用 describe 函数获取描述性统计信息
description = df.describe()
print('描述性统计信息:')
print(description)

37. Check for missing and non‑missing values

# 使用 isnull 函数检查缺失值
missing_values = df.isnull().sum()
print('缺失值统计:')
print(missing_values)
# 使用 notnull 函数检查非缺失值
non_missing_values = df.notnull().sum()
print('非缺失值统计:')
print(non_missing_values)

38. Drop rows or columns with missing data

# 删除包含缺失值的行
df.dropna(inplace=True)
print('删除缺失值后的数据 (行):')
print(df.head())
# 删除包含缺失值的列
df.dropna(axis=1, inplace=True)
print('删除缺失值后的数据 (列):')
print(df.head())

39. Detect duplicate rows

# 使用 duplicated 函数检查重复行
duplicates = df.duplicated()
print('重复行检查:')
print(duplicates)

40. Perform a complex query

# 使用 query 函数进行复杂查询
filtered_data = df.query('A == "(5, 10]" and B < 80')
print('复杂查询结果:')
print(filtered_data.head())

The collection demonstrates how pandas can be used for a wide range of data‑processing tasks, from basic I/O to advanced transformation, aggregation, and visualization, making it a valuable reference for data analysts and scientists.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Excel Pandas

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.