Fundamentals 8 min read

How to Merge XLS and XLSX Files by Name Using Python Pandas

This guide explains how to combine a master XLS file with a detailed XLSX file by matching the "Name" column, handling non‑standard headers, skipping index columns, inserting new fields at a specific position, and exporting the merged result using Python, pandas, xlrd, and openpyxl.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
How to Merge XLS and XLSX Files by Name Using Python Pandas

1. Requirement Breakdown

Goal: merge a master XLS file with a detailed info XLSX file by matching the "Name" column, handling non‑standard headers, skipping the first column if it contains serial numbers, and inserting the new fields at a specific position (e.g., after column I).

2. Generic Excel Reading Function (XLS/XLSX)

Because pandas cannot directly read merged cells in old .xls files, the solution uses xlrd for .xls and pandas.read_excel with the openpyxl engine for .xlsx. The function read_excel_any accepts parameters to skip header rows and optionally drop the first column.

def read_excel_any(path, sheet_name=None, skip_header_rows=0, skip_first_column=False):
    """Support XLS/XLSX universal reading.
    - skip_header_rows: number of header rows to skip
    - skip_first_column: whether to drop the first column (serial number)"""
    if path.lower().endswith('.xls'):
        book = xlrd.open_workbook(path)
        sheet = book.sheet_by_name(sheet_name) if sheet_name else book.sheet_by_index(0)
        data = []
        for r in range(sheet.nrows):
            if r < skip_header_rows:
                continue
            row = sheet.row_values(r)
            if skip_first_column:
                row = row[1:]
            data.append(row)
        df = pd.DataFrame(data[1:], columns=data[0])
        return df
    else:
        df = pd.read_excel(path, sheet_name=sheet_name, engine="openpyxl", skiprows=skip_header_rows)
        if skip_first_column:
            df = df.iloc[:, 1:]
        return df

3. Overall Matching Workflow

Read the master table (XLS or XLSX).

Read the info table, skipping the header row(s) and the first column if it contains an index.

Set the "Name" column of the info table as the index with df_info.set_index("姓名").

Iterate over each name in the master table, retrieve the corresponding row from the info index, or fill with None when missing.

Combine the matched rows into a new DataFrame.

Insert the new columns after the desired position (e.g., column 8) while preserving the original column order.

Export the result to a new XLSX file.

4. Key Technical Points

4.1 Why a universal read function matters

Reusable script across projects.

Only file names need to change when swapping Excel sources.

Clear structure simplifies maintenance.

4.2 Using set_index for fast matching

Setting the "Name" column as the index turns the lookup into a dictionary‑like operation, giving O(1) access time.

4.3 Preserving original column order

By constructing new_cols that concatenates the original columns before and after the insertion point, the final spreadsheet keeps the template layout intact.

5. Complete Example Code (sanitized)

import pandas as pd
import xlrd
import openpyxl

# Configuration (example)
main_file = "主表.xls"
main_sheet = "数据表"
info_file = "信息表.xlsx"
info_sheet = "全部信息"
info_skip_header = 1
info_skip_first_col = True

# Read tables
df_main = read_excel_any(main_file, sheet_name=main_sheet)
df_info = read_excel_any(info_file, sheet_name=info_sheet,
                         skip_header_rows=info_skip_header,
                         skip_first_column=info_skip_first_col)

# Verify "姓名" column exists
if "姓名" not in df_main.columns or "姓名" not in df_info.columns:
    raise ValueError("Both tables must contain a '姓名' column!")

# Index info table by name
df_info_index = df_info.set_index("姓名")

# Match rows
matched_data = []
for name in df_main["姓名"]:
    if name in df_info_index.index:
        matched_data.append(df_info_index.loc[name].to_dict())
    else:
        matched_data.append({col: None for col in df_info.columns if col != "姓名"})

df_match = pd.DataFrame(matched_data)

# Insert matched columns after column 8 (example)
insert_pos = 8
cols_main = df_main.columns.tolist()
new_cols = cols_main[:insert_pos + 1] + df_match.columns.tolist() + cols_main[insert_pos + 1:]
df_out = pd.concat([df_main, df_match], axis=1)[new_cols]

# Export
df_out.to_excel("输出结果.xlsx", index=False)

6. Result

Original master columns remain in their original order.

New fields from the info table appear after the specified column.

Rows are matched by name; missing names receive blank values.

The whole process is fully automated, eliminating manual copy‑paste.

Exceldata-mergepandasopenpyxlxlrd
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.