Fundamentals 22 min read

Python xlwings & pandas tutorials for batch sorting, summarizing, and statistical analysis of Excel workbooks

This article presents a series of Python examples using xlwings and pandas to batch‑sort worksheets, filter and aggregate data across multiple workbooks, compute summary statistics, perform correlation, ANOVA, regression, and generate pivot tables and visualizations, illustrating practical Excel automation and data‑analysis techniques.

Python Programming Learning Circle

Jan 15, 2022

Python xlwings & pandas tutorials for batch sorting, summarizing, and statistical analysis of Excel workbooks

The guide demonstrates how to automate Excel tasks with xlwings and pandas. It starts with a script that opens a workbook, reads each sheet into a DataFrame, sorts the "销售利润" column in ascending order, writes the result back, and saves the file:

import xlwings as xw
import pandas as pd
app = xw.App(visible=True, add_book=False)
workbook = app.books.open(r'C:\...\产品销售统计表.xlsx')
for i in workbook.sheets:
    values = i.range('A1').expand().options(pd.DataFrame).value
    result = values.sort_values(by='销售利润')
    i.range('A1').value = result
workbook.save()
workbook.close()
app.quit()

Subsequent examples extend the concept to multiple workbooks, filtering data, grouping by product, and writing aggregated results to new sheets. One script merges all sheets, groups by "采购物品", and writes each group to a separate worksheet with a sum row.

import os, xlwings as xw, pandas as pd
app = xw.App(visible=True, add_book=False)
file_path = r'C:\...\采购表'
for file in os.listdir(file_path):
    if file.endswith('.xlsx'):
        wb = app.books.open(os.path.join(file_path, file))
        # process each sheet ...
        wb.save(); wb.close()
app.quit()

Statistical analysis sections show how to compute descriptive statistics, correlation matrices, and perform ANOVA using pandas, statsmodels, and sklearn. For example, a correlation matrix is obtained with:

import pandas as pd
df = pd.read_excel(r'C:\...\相关性分析.xlsx', index_col='代理商编号')
result = df.corr()
print(result)

ANOVA is performed by reshaping the data with melt(), fitting an OLS model, and calling anova_lm:

from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
model = ols('Value~C(Treat)', data=df_melt).fit()
anova_table = anova_lm(model, typ=3)
print(anova_table)

Regression examples use sklearn.linear_model.LinearRegression to fit a model predicting car sales from advertising spend, then compute the regression equation and a forecast for given ad budgets.

from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(X, y)
print('y = {:.2f} + {:.2f}*x1 + {:.2f}*x2'.format(model.intercept_, *model.coef_))
forecast = model.intercept_ + 20*model.coef_[0] + 30*model.coef_[1]
print('Forecast:', forecast)

Visualization snippets illustrate creating pivot tables, box plots, and histograms with matplotlib, then inserting the figures back into Excel worksheets via xlwings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics Pandas machine-learning data-analysis xlwings excel-automation

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.