Comprehensive Guide to Excel Operations with Pandas in Python
This tutorial demonstrates how to use Python's pandas library to create, read, modify, and manage Excel files, covering basic DataFrame creation, sheet handling, column operations, filtering, merging, pivot tables, charting with matplotlib, performance tips, database integration, and advanced visualizations with seaborn and Plotly.
Basic Operations
1. Create a simple DataFrame and save it as an Excel file.
import pandas as pd
# 创建一个简单的DataFrame
data = {
'姓名': ['张三', '李四', '王五'],
'年龄': [28, 34, 29],
'城市': ['北京', '上海', '广州']
}
df = pd.DataFrame(data)
# 将DataFrame保存为Excel文件
df.to_excel('example.xlsx', index=False)
print("已创建并保存Excel文件")2. Read data from the Excel file.
# 读取Excel文件
df = pd.read_excel('example.xlsx')
print("已读取Excel文件")
print(df)3. Get worksheet names.
# 使用pd.ExcelFile获取所有工作表名称
with pd.ExcelFile('example.xlsx') as xls:
sheets = xls.sheet_names
print("工作表名称:", sheets)4. Select a specific worksheet.
# 读取特定工作表的数据
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
print("已读取指定工作表")
print(df)5. Write multiple worksheets.
# 创建多个DataFrame
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})
# 将多个DataFrame写入不同工作表
with pd.ExcelWriter('multiple_sheets.xlsx') as writer:
df1.to_excel(writer, sheet_name='Sheet1', index=False)
df2.to_excel(writer, sheet_name='Sheet2', index=False)
print("已创建并保存包含多个工作表的Excel文件")Data Operations
6. Add a new column.
# 读取Excel文件
df = pd.read_excel('example.xlsx')
# 添加新列
df['职业'] = ['工程师', '医生', '教师']
# 保存更新后的DataFrame
df.to_excel('example_updated.xlsx', index=False)
print("已添加新列并保存")7. Delete a column.
# 删除某一列
df.drop(columns=['城市'], inplace=True)
# 保存更新后的DataFrame
df.to_excel('example_deleted_column.xlsx', index=False)
print("已删除列并保存")8. Modify a cell value.
# 修改某个单元格的值
df.at[0, '年龄'] = 30
# 保存更新后的DataFrame
df.to_excel('example_modified_cell.xlsx', index=False)
print("已修改单元格值并保存")9. Filter data by condition.
# 按条件筛选数据
filtered_df = df[df['年龄'] > 30]
# 保存筛选后的数据
filtered_df.to_excel('example_filtered.xlsx', index=False)
print("已按条件筛选数据并保存")10. Merge two Excel files.
# 读取两个Excel文件
df1 = pd.read_excel('file1.xlsx')
df2 = pd.read_excel('file2.xlsx')
# 合并两个DataFrame
combined_df = pd.concat([df1, df2])
# 保存合并后的DataFrame
combined_df.to_excel('combined_file.xlsx', index=False)
print("已合并两个Excel文件并保存")Advanced Operations
11. Create a pivot table.
# 创建数据透视表
pivot_table = pd.pivot_table(df, values='年龄', index=['城市'], aggfunc=np.mean)
# 保存数据透视表
pivot_table.to_excel('pivot_table.xlsx')
print("已创建并保存数据透视表")12. Plot a bar chart.
import matplotlib.pyplot as plt
# 绘制柱状图
df.plot(kind='bar', x='姓名', y='年龄')
plt.title('年龄分布')
plt.xlabel('姓名')
plt.ylabel('年龄')
plt.show()13. Format output (e.g., set font color).
from openpyxl import load_workbook
from openpyxl.styles import Font
# 加载Excel文件
wb = load_workbook('example.xlsx')
ws = wb.active
# 设置字体颜色
for cell in ws['B']:
cell.font = Font(color="FF0000") # 红色
# 保存修改后的文件
wb.save('formatted_example.xlsx')
print("已格式化并保存Excel文件")14. Apply conditional formatting.
from openpyxl.formatting.rule import ColorScaleRule
# 添加条件格式化规则
rule = ColorScaleRule(start_type='num', start_value=25, start_color='FFAA0000',
end_type='num', end_value=40, end_color='FF00AA00')
ws.conditional_formatting.add('B2:B4', rule)
# 保存修改后的文件
wb.save('conditional_formatting_example.xlsx')
print("已应用条件格式化并保存")15. Insert an image.
from openpyxl.drawing.image import Image
# 插入图片
img = Image('path_to_image.png')
ws.add_image(img, 'D2')
# 保存修改后的文件
wb.save('image_inserted_example.xlsx')
print("已插入图片并保存")Data Analysis and Summary
16. Compute descriptive statistics.
# 计算描述性统计信息
stats = df.describe()
# 保存统计信息
stats.to_excel('descriptive_statistics.xlsx')
print("已计算并保存描述性统计信息")17. Group by and sum.
# 分组求和
grouped_sum = df.groupby('城市')['年龄'].sum()
# 保存分组求和结果
grouped_sum.to_excel('grouped_sum.xlsx')
print("已分组求和并保存")18. Compute correlation matrix.
# 计算相关系数矩阵
corr_matrix = df.corr()
# 保存相关系数矩阵
corr_matrix.to_excel('correlation_matrix.xlsx')
print("已计算并保存相关系数矩阵")File and Directory Operations
19. Batch process multiple Excel files.
import os
# 获取目录下的所有Excel文件
excel_files = [f for f in os.listdir() if f.endswith('.xlsx')]
# 批量处理每个文件
for file in excel_files:
df = pd.read_excel(file)
# 对每个DataFrame进行操作
# ...
print(f"已处理文件 {file}")20. Create a backup file.
import shutil
# 创建备份文件
shutil.copy('example.xlsx', 'example_backup.xlsx')
print("已创建备份文件")Performance Optimization
21. Read large files in chunks.
# 分块读取大文件
for chunk in pd.read_excel('large_file.xlsx', chunksize=1000):
# 对每个分块进行操作
# ...
print("已处理一个分块")22. Convert data types to save memory.
# 转换数据类型
df['年龄'] = df['年龄'].astype('int8')
# 保存转换后的DataFrame
df.to_excel('optimized_example.xlsx', index=False)
print("已转换数据类型并保存")Special Uses
23. Handle missing values.
# 处理缺失值
df.fillna(0, inplace=True)
# 保存处理后的DataFrame
df.to_excel('filled_na_example.xlsx', index=False)
print("已填充缺失值并保存")24. Add data validation.
# 添加数据验证规则
from openpyxl.worksheet.datavalidation import DataValidation
dv = DataValidation(type="list", formula1='"北京,上海,广州"', allow_blank=True)
ws.add_data_validation(dv)
dv.add('C2:C4')
# 保存修改后的文件
wb.save('data_validation_example.xlsx')
print("已添加数据验证规则并保存")25. Create a new file from a template.
# 使用模板创建新文件
template_wb = load_workbook('template.xlsx')
new_ws = template_wb.copy_worksheet(template_wb.active)
# 保存新文件
template_wb.save('new_file_from_template.xlsx')
print("已使用模板创建新文件并保存")Connecting to External Systems
26. Read data from an SQL database.
import sqlite3
# 连接到SQLite数据库并读取数据
conn = sqlite3.connect('database.db')
query = "SELECT * FROM table"
df = pd.read_sql(query, conn)
# 保存读取的数据
df.to_excel('data_from_sql.xlsx', index=False)
print("已从SQL数据库读取数据并保存")27. Write data to an SQL database.
# 将数据写入SQL数据库
df.to_sql('table_name', conn, if_exists='replace', index=False)
print("已将数据写入SQL数据库")Advanced Charts and Visualization
28. Draw a heatmap with Seaborn.
import seaborn as sns
# 创建热力图用的数据框
heatmap_data = df.pivot_table(index='城市', columns='姓名', values='年龄', aggfunc='mean')
# 绘制热力图
sns.heatmap(heatmap_data, annot=True, cmap="YlGnBu")
plt.title('年龄按城市和姓名的热力图')
plt.show()29. Create an interactive chart with Plotly.
import plotly.express as px
# 绘制交互式柱状图
fig = px.bar(df, x='姓名', y='年龄', title='年龄分布')
fig.show()30. Export a chart as an image file.
# 导出图表为PNG文件
fig.write_image("chart.png")
print("已导出图表为图像文件")Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.