Fundamentals 14 min read

Master Python File & Excel Automation: Walk Directories, Merge Sheets, and Manage Data

This guide walks through using Python's os and pathlib modules to traverse directories, demonstrates how to combine file paths, shows practical examples of merging multiple Excel workbooks and sheets with pandas, xlrd, and xlsxwriter, and explains splitting a workbook by column values, providing complete code snippets and results.

Python Crawling & Data Mining

Feb 16, 2020

Master Python File & Excel Automation: Walk Directories, Merge Sheets, and Manage Data

1. Related Knowledge Points

1) Required Libraries

import numpy as np
import pandas as pd
import os

2) os.walk(pwd)

Example: list contents of "G:\a".

Code:

pwd = "G:\\a"
print(os.walk(pwd))
for i in os.walk(pwd):
    print(i)
for path, dirs, files in os.walk(pwd):
    print(files)

Result:

<generator object walk at 0x0000029BB5AEAB88>
('G:\\a', [], ['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx'])
['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx']

3) os.path.join(path1, path2…)

Purpose: combine multiple path components.

path1 = 'G:\\a'
path2 = 'aa.txt'
print(os.path.join(path1, path2))

Result:

G:\a\aa.txt

4) Case Study

Example code to collect all file paths:

pwd = "G:\\a"
file_path_list = []
for path, dirs, files in os.walk(pwd):
    for file in files:
        file_path_list.append(os.path.join(pwd, file))
print(file_path_list)

Result:

['G:\\a\\aa.txt','G:\\a\\bb.xlsx','G:\\a\\cc.txt','G:\\a\\dd.docx']

5) Storing Multiple DataFrames in a List

# create two DataFrames
import numpy as np
xx = np.arange(15).reshape(5,3)
yy = np.arange(1,16).reshape(5,3)
xx = pd.DataFrame(xx, columns=["语文","数学","外语"])
yy = pd.DataFrame(yy, columns=["语文","数学","外语"])
print(xx)
print(yy)

Concatenate them:

concat_list = []
concat_list.append(xx)
concat_list.append(yy)
z = pd.concat(concat_list, ignore_index=True)
print(z)

2. Multi-Workbook Merging (Part 1)

1) Merge multiple single‑sheet Excel files into one

import pandas as pd
import os
pwd = "G:\\b"
df_list = []
for path, dirs, files in os.walk(pwd):
    for file in files:
        file_path = os.path.join(path, file)
        df = pd.read_excel(file_path)
        df_list.append(df)
result = pd.concat(df_list)
print(result)
result.to_excel('G:\\b\\result.xlsx', index=False)

3. Multi-Workbook Merging (Part 2)

1) Related Knowledge Points

Using xlsxwriter to create workbooks and worksheets.

# create a workbook
import xlsxwriter
workbook = xlsxwriter.Workbook("demo.xlsx")
workbook.close()
# add a worksheet named "2018 Sales"
workbook = xlsxwriter.Workbook("cc.xlsx")
worksheet = workbook.add_worksheet("2018 Sales")
workbook.close()
# write header and a row
workbook = xlsxwriter.Workbook("demo.xlsx")
worksheet = workbook.add_worksheet("2018 Sales")
headings = ['Product','Sales','Price']
worksheet.write_row('A1', headings)
data = ["Apple", 500, 8.9]
for i in range(len(headings)):
    worksheet.write(1, i, data[i])
workbook.close()

Other usage reference: https://www.cnblogs.com/brightbrother/p/8671077.html

xlrd usage

import xlrd
file = r"G:\Jupyter\test.xlsx"
fh = xlrd.open_workbook(file)
print(fh.sheet_names())
print(fh.sheets())
print(fh.sheets()[0].nrows)
print(fh.sheets()[0].ncols)
for row in range(fh.sheets()[0].nrows):
    value = fh.sheets()[0].row_values(row)
    print(value)

2) Merge multiple Excel files with multiple sheets

import xlrd, xlsxwriter, os

def open_xlsx(file):
    return xlrd.open_workbook(file)

def get_sheet_num(fh):
    return len(fh.sheets())

def get_file_content(file, shnum):
    fh = open_xlsx(file)
    table = fh.sheets()[shnum]
    for row in range(table.nrows):
        rdata = table.row_values(row)
        if rdata == ['姓名','性别','年龄','家庭住址']:
            pass
        else:
            datavalue.append(rdata)
    return datavalue
# (Class definitions omitted for brevity)

4. Merge Multiple Sheets in One Workbook

1) Combine all sheets of an Excel file into a single sheet

import xlrd, pandas as pd, openpyxl
excel_name = r"D:\pp.xlsx"
wb = xlrd.open_workbook(excel_name)
sheets = wb.sheet_names()
alldata = pd.DataFrame()
for i in range(len(sheets)):
    df = pd.read_excel(excel_name, sheet_name=i, index=False, encoding='utf8')
    alldata = alldata.append(df)
writer = pd.ExcelWriter(r"C:\Users\Administrator\Desktop\score.xlsx", engine='openpyxl')
book = load_workbook(writer.path)
writer.book = book
alldata.to_excel(writer, sheet_name="ALLDATA")
writer.save()
writer.close()

5. Split a Table by Column

1) Split an Excel sheet into multiple files based on a column

import pandas as pd, xlsxwriter
data = pd.read_excel(r"C:\Users\Administrator\Desktop\chaifen.xlsx", encoding='gbk')
area_list = list(set(data['店铺']))
writer = pd.ExcelWriter(r"C:\Users\Administrator\Desktop\拆好的表1.xlsx", engine='xlsxwriter')
data.to_excel(writer, sheet_name="总表", index=False)
for j in area_list:
    df = data[data['店铺'] == j]
    df.to_excel(writer, sheet_name=j, index=False)
writer.save()

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python File I/O Excel xlsxwriter xlrd

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.