Master Python File & Excel Automation: Walk Directories, Merge Sheets, and Manage Data
This guide walks through using Python's os and pathlib modules to traverse directories, demonstrates how to combine file paths, shows practical examples of merging multiple Excel workbooks and sheets with pandas, xlrd, and xlsxwriter, and explains splitting a workbook by column values, providing complete code snippets and results.
1. Related Knowledge Points
1) Required Libraries
import numpy as np
import pandas as pd
import os2) os.walk(pwd)
Example: list contents of "G:\a".
Code:
pwd = "G:\\a"
print(os.walk(pwd))
for i in os.walk(pwd):
print(i)
for path, dirs, files in os.walk(pwd):
print(files)Result:
<generator object walk at 0x0000029BB5AEAB88>
('G:\\a', [], ['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx'])
['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx']3) os.path.join(path1, path2…)
Purpose: combine multiple path components.
path1 = 'G:\\a'
path2 = 'aa.txt'
print(os.path.join(path1, path2))Result:
G:\a\aa.txt4) Case Study
Example code to collect all file paths:
pwd = "G:\\a"
file_path_list = []
for path, dirs, files in os.walk(pwd):
for file in files:
file_path_list.append(os.path.join(pwd, file))
print(file_path_list)Result:
['G:\\a\\aa.txt','G:\\a\\bb.xlsx','G:\\a\\cc.txt','G:\\a\\dd.docx']5) Storing Multiple DataFrames in a List
# create two DataFrames
import numpy as np
xx = np.arange(15).reshape(5,3)
yy = np.arange(1,16).reshape(5,3)
xx = pd.DataFrame(xx, columns=["语文","数学","外语"])
yy = pd.DataFrame(yy, columns=["语文","数学","外语"])
print(xx)
print(yy)Concatenate them:
concat_list = []
concat_list.append(xx)
concat_list.append(yy)
z = pd.concat(concat_list, ignore_index=True)
print(z)2. Multi-Workbook Merging (Part 1)
1) Merge multiple single‑sheet Excel files into one
import pandas as pd
import os
pwd = "G:\\b"
df_list = []
for path, dirs, files in os.walk(pwd):
for file in files:
file_path = os.path.join(path, file)
df = pd.read_excel(file_path)
df_list.append(df)
result = pd.concat(df_list)
print(result)
result.to_excel('G:\\b\\result.xlsx', index=False)3. Multi-Workbook Merging (Part 2)
1) Related Knowledge Points
Using xlsxwriter to create workbooks and worksheets.
# create a workbook
import xlsxwriter
workbook = xlsxwriter.Workbook("demo.xlsx")
workbook.close()
# add a worksheet named "2018 Sales"
workbook = xlsxwriter.Workbook("cc.xlsx")
worksheet = workbook.add_worksheet("2018 Sales")
workbook.close()
# write header and a row
workbook = xlsxwriter.Workbook("demo.xlsx")
worksheet = workbook.add_worksheet("2018 Sales")
headings = ['Product','Sales','Price']
worksheet.write_row('A1', headings)
data = ["Apple", 500, 8.9]
for i in range(len(headings)):
worksheet.write(1, i, data[i])
workbook.close()Other usage reference: https://www.cnblogs.com/brightbrother/p/8671077.html
xlrd usage
import xlrd
file = r"G:\Jupyter\test.xlsx"
fh = xlrd.open_workbook(file)
print(fh.sheet_names())
print(fh.sheets())
print(fh.sheets()[0].nrows)
print(fh.sheets()[0].ncols)
for row in range(fh.sheets()[0].nrows):
value = fh.sheets()[0].row_values(row)
print(value)2) Merge multiple Excel files with multiple sheets
import xlrd, xlsxwriter, os
def open_xlsx(file):
return xlrd.open_workbook(file)
def get_sheet_num(fh):
return len(fh.sheets())
def get_file_content(file, shnum):
fh = open_xlsx(file)
table = fh.sheets()[shnum]
for row in range(table.nrows):
rdata = table.row_values(row)
if rdata == ['姓名','性别','年龄','家庭住址']:
pass
else:
datavalue.append(rdata)
return datavalue
# (Class definitions omitted for brevity)4. Merge Multiple Sheets in One Workbook
1) Combine all sheets of an Excel file into a single sheet
import xlrd, pandas as pd, openpyxl
excel_name = r"D:\pp.xlsx"
wb = xlrd.open_workbook(excel_name)
sheets = wb.sheet_names()
alldata = pd.DataFrame()
for i in range(len(sheets)):
df = pd.read_excel(excel_name, sheet_name=i, index=False, encoding='utf8')
alldata = alldata.append(df)
writer = pd.ExcelWriter(r"C:\Users\Administrator\Desktop\score.xlsx", engine='openpyxl')
book = load_workbook(writer.path)
writer.book = book
alldata.to_excel(writer, sheet_name="ALLDATA")
writer.save()
writer.close()5. Split a Table by Column
1) Split an Excel sheet into multiple files based on a column
import pandas as pd, xlsxwriter
data = pd.read_excel(r"C:\Users\Administrator\Desktop\chaifen.xlsx", encoding='gbk')
area_list = list(set(data['店铺']))
writer = pd.ExcelWriter(r"C:\Users\Administrator\Desktop\拆好的表1.xlsx", engine='xlsxwriter')
data.to_excel(writer, sheet_name="总表", index=False)
for j in area_list:
df = data[data['店铺'] == j]
df.to_excel(writer, sheet_name=j, index=False)
writer.save()Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
