Fundamentals 9 min read

Processing Excel Files in Python with openpyxl and pandas: Reading, Writing, Styling, and Merging

This article explains how to use Python's openpyxl and pandas libraries to read, write, style, and merge Excel files, covering installation, basic operations, advanced cell formatting, data manipulation, and techniques for combining multiple sheets or workbooks into a single DataFrame.

Test Development Learning Exchange

Oct 3, 2024

Processing Excel Files in Python with openpyxl and pandas: Reading, Writing, Styling, and Merging

Introduction Handling Excel files is a common task in Python for data analysis, reporting, and automation. The two most popular libraries are openpyxl for low‑level Excel manipulation and pandas for high‑level data processing.

1. Using openpyxl

Installation pip install openpyxl Reading an Excel file

from openpyxl import load_workbook
# Load an existing workbook
wb = load_workbook('example.xlsx')
# Select the active worksheet
sheet = wb.active  # or sheet = wb['Sheet1']
# Read a single cell
cell_value = sheet['A1'].value
print(f"Cell A1: {cell_value}")
# Iterate over rows and columns
for row in sheet.iter_rows(min_row=1, max_col=3, max_row=2):
    for cell in row:
        print(cell.value)

Writing to an Excel file

from openpyxl import Workbook
wb = Workbook()               # Create a new workbook
sheet = wb.active             # Get the default sheet
sheet['A1'] = "Hello"
sheet['B1'] = "World"
sheet.append([1, 2, 3])      # Add a new row
wb.save('new_example.xlsx') # Save the workbook

Setting cell styles

from openpyxl.styles import Font, Alignment, PatternFill
sheet['A1'].font = Font(bold=True, color="FF0000")
sheet['A1'].alignment = Alignment(horizontal="center", vertical="center")
sheet['A1'].fill = PatternFill(start_color="FFFF00", end_color="FFFF00", fill_type="solid")

2. Using pandas

Installation

pip install pandas
pip install openpyxl  # required for xlsx support

Reading Excel files

import pandas as pd
# Read the whole workbook
df = pd.read_excel('example.xlsx')
# Read a specific sheet
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
# Read multiple sheets
dfs = pd.read_excel('example.xlsx', sheet_name=['Sheet1', 'Sheet2'])
# Read all sheets into a dict
dfs = pd.read_excel('example.xlsx', sheet_name=None)
print(df.head())

Writing Excel files

import pandas as pd
# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Write to a single sheet
df.to_excel('new_example.xlsx', index=False)
# Write to multiple sheets
with pd.ExcelWriter('new_example.xlsx') as writer:
    df.to_excel(writer, sheet_name='Sheet1', index=False)
    df.to_excel(writer, sheet_name='Sheet2', index=False)

Data processing examples

# Filter rows
filtered_df = df[df['Age'] > 28]
# Sort rows
sorted_df = df.sort_values(by='Age', ascending=False)
# Group and aggregate
grouped_df = df.groupby('City').agg({'Age': 'mean'})
# Concatenate DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B': ['B3', 'B4', 'B5']})
merged_df = pd.concat([df1, df2], ignore_index=True)

Excel data file merging: multiple sheets and multiple files

Merge multiple sheets from a single workbook

import pandas as pd
file_path = 'example.xlsx'
xls = pd.ExcelFile(file_path)
sheet_names = xls.sheet_names
combined_df = pd.DataFrame()
for sheet_name in sheet_names:
    df = pd.read_excel(file_path, sheet_name=sheet_name)
    combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
combined_df.to_excel('combined_output.xlsx', index=False)

Merge multiple Excel files (each may contain multiple sheets)

import pandas as pd, glob
file_paths = glob.glob('data/*.xlsx')
combined_df = pd.DataFrame()
for file_path in file_paths:
    xls = pd.ExcelFile(file_path)
    sheet_names = xls.sheet_names
    for sheet_name in sheet_names:
        df = pd.read_excel(file_path, sheet_name=sheet_name)
        combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
combined_df.to_excel('combined_output.xlsx', index=False)

Handling different column names

import pandas as pd, glob
file_paths = glob.glob('data/*.xlsx')
combined_df = pd.DataFrame()
for file_path in file_paths:
    xls = pd.ExcelFile(file_path)
    sheet_names = xls.sheet_names
    for sheet_name in sheet_names:
        df = pd.read_excel(file_path, sheet_name=sheet_name)
        # Standardize column names (example keeps only Name and Age)
        df.columns = ['Name', 'Age', 'City']
        df = df[['Name', 'Age']]
        combined_df = pd.concat([combined_df, df], ignore_index=True)
print(combined_df)
combined_df.to_excel('combined_output.xlsx', index=False)

Conclusion

openpyxl is ideal when fine‑grained control over Excel files is needed, such as custom cell styles or inserting charts. pandas excels at data analysis and manipulation, offering powerful filtering, sorting, grouping, and merging capabilities for large datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Excel openpyxl data-processing

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.