How to Merge Multiple Excel Files and Preserve Headers with Python Pandas
This article walks through handling merged headers and formula cells in Excel using Python's pandas and openpyxl, providing step‑by‑step code to merge multiple workbooks, skip unwanted rows, assign custom headers, and format the final sheet with merged cells.
1. Introduction
Hello, I am a Python enthusiast. In a recent group discussion I was asked how to handle two Excel issues: merged header cells that are not recognized, and cells containing formulas that are read as zero.
The following script merges multiple Excel files, concatenates their sheets, and saves the result to a single workbook.
import pandas as pd
import os
folder_path = r'C:/Users/mengxianqiao/merge_excel_files/测试数据' # replace with actual folder
all_data = {}
for file_name in os.listdir(folder_path):
if file_name.endswith('.xlsx'):
file_path = os.path.join(folder_path, file_name)
xls = pd.ExcelFile(file_path)
for sheet_name in xls.sheet_names:
if sheet_name not in all_data:
all_data[sheet_name] = pd.DataFrame()
header_rows = pd.read_excel(file_path, sheet_name=sheet_name, nrows=1).shape[0]
sheet_data = pd.read_excel(file_path, sheet_name=sheet_name, skiprows=range(1, header_rows+1))
all_data[sheet_name] = pd.concat([all_data[sheet_name], sheet_data], ignore_index=True)
output_csv = r"C:/Users/mengxianqiao/merge_excel_files/测试数据/汇总.xlsx"
with pd.ExcelWriter(output_csv, engine='openpyxl') as writer:
for sheet_name, df in all_data.items():
df.to_excel(writer, sheet_name=sheet_name, index=False)
print('Data has been successfully merged and saved to 汇总.xlsx.')2. Implementation Details
Peers suggested skipping header rows when reading the files and then manually adding a unified header. The code below reads each Excel file, skips the first four rows, assigns a custom header, and drops empty rows.
import pandas as pd
import pathlib
folder = r"C:\Users\Desktop\民主评议表"
excel_files = pathlib.Path(folder).glob('*.xls')
header = ['姓名', '以学铸魂', '以学增智', '以学正风', '以学促干']
data = []
for i in excel_files:
df = pd.read_excel(i, skiprows=4, header=None, index_col=0, usecols='A:F')
df.dropna(inplace=True)
df.columns = headerWhen using openpyxl, setting data_only=True returns the calculated values instead of the formulas.
Another approach merges cells in the output sheet to create a combined header.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1, 10, size=(20, 10)))
with pd.ExcelWriter('写入合并表头.xlsx', engine='openpyxl') as writer:
book = writer.book
sheet_name = '写入合并表头'
df.to_excel(writer, sheet_name=sheet_name, index=False, startrow=1)
sh = book[sheet_name]
sh['A1'] = '表头合并'
sh.merge_cells('A1:H1')3. Conclusion
The article demonstrates how to solve common Excel data‑processing problems in Python by using pandas for reading and concatenating data and openpyxl for fine‑grained formatting such as merged headers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
