Using Python pandas to Replicate Excel Functions: VLOOKUP, Data I/O, Pivot Tables, and Plotting
This article demonstrates how to replace Excel's common data‑analysis tasks—such as VLOOKUP, data import/export, pivot tables, and charting—by using Python's pandas library together with Plotly for visualization, providing code examples and performance tips for each scenario.
Historically, business analysis relied on Excel, but as data volumes grew, analysts shifted to Python and R, adopting the term "Business Analytics". Both tools follow the DRY principle, aiming to replace repetitive manual operations.
Python data analysis centers on the pandas package, whose ecosystem now rivals Excel. The most frequently used I/O functions are read_csv , read_excel , to_csv , and to_excel , each offering extensive parameters for customized import and export.
Python‑Excel Interaction
Python and Excel interaction
VLOOKUP equivalent
Pivot tables
Plotting
Typical usage begins with importing pandas and configuring display options:
import numpy as np
import pandas as pd
pd.set_option('max_columns', 10)
pd.set_option('max_rows', 20)
pd.set_option('display.float_format', lambda x: '%.2f' % x) # disable scientific notationVLOOKUP Cases
Case 1 : Map numeric scores to letter grades using a custom function and apply :
df = pd.read_excel("test.xlsx", sheet_name=0)
def grade_to_point(x):
if x >= 90:
return 'A'
elif x >= 80:
return 'B'
elif x >= 70:
return 'C'
elif x >= 60:
return 'D'
else:
return 'E'
df['等级'] = df['语文'].apply(grade_to_point)
print(df)Case 2 : Cross‑sheet lookup for depreciation amounts using merge :
df1 = pd.read_excel("test.xlsx", sheet_name='折旧明细表')
df2 = pd.read_excel("test.xlsx", sheet_name=1)
result = df2.merge(df1[['编号', '月折旧额']], how='left', on='编号')
print(result)Case 3 : Approximate lookup with custom logic and pandas indexing.
df1 = pd.read_excel("test.xlsx", sheet_name='折旧明细表')
df3 = pd.read_excel("test.xlsx", sheet_name=3)
df3['月折旧额'] = 0
for i in range(len(df3['资产名称'])):
df3['月折旧额'][i] = df1[df1['资产名称'].map(lambda x: df3['资产名称'][i] in x)]['月折旧额']
print(df3)Case 4 : Auto‑populate employee information based on employee ID using a merge:
df4 = pd.read_excel("test.xlsx", sheet_name='员工基本信息表')
df5 = pd.read_excel("test.xlsx", sheet_name='请假统计表')
merged = df5.merge(df4[['工号', '姓名', '部门', '职务', '入职日期']], on='工号')
print(merged)Case 5 : Retrieve all consumption records for a specific person:
df6 = pd.read_excel("test.xlsx", sheet_name='消费额')
result = df6[df6['姓名'] == '张一'][['姓名', '消费额']]
print(result)Pivot Tables and Groupby
Excel pivot tables can be reproduced with pandas groupby or pivot_table . Example using groupby to aggregate sales, cost, and profit by month and region:
df = pd.read_excel('test.xlsx', sheet_name='销售统计表')
df['订购月份'] = df['订购日期'].apply(lambda x: x.month)
agg = df.groupby(['订购月份', '所属区域'])[['销售额', '成本']].sum()
agg['利润'] = agg['销售额'] - agg['成本']
print(agg)The same result can be achieved with pivot_table :
pivot = pd.pivot_table(df, values=['销售额', '成本'], index=['订购月份', '所属区域'], aggfunc='sum')
pivot['利润'] = pivot['销售额'] - pivot['成本']
print(pivot)Visualization with Plotly
Bar charts and radar charts are created using Plotly's graph_objs interface. Example bar chart code:
import plotly.offline as off
import plotly.graph_objs as go
off.init_notebook_mode()
df = pd.read_excel('plot.xlsx', sheet_name='高等教育入学率')
trace1 = go.Bar(x=df['国家'], y=df[1995], name='1995', opacity=0.6, marker=dict(color='powderblue'))
trace2 = go.Bar(x=df['国家'], y=df[2005], name='2005', opacity=0.6, marker=dict(color='aliceblue'))
trace3 = go.Bar(x=df['国家'], y=df[2014], name='2014', opacity=0.6, marker=dict(color='royalblue'))
layout = go.Layout(barmode='group')
fig = go.Figure(data=[trace1, trace2, trace3], layout=layout)
off.plot(fig)Radar chart example:
df = pd.read_excel('plot.xlsx', sheet_name='政治治理')
theta = df.columns.tolist()
theta.append(theta[0])
names = df.index.tolist()
# Convert to numpy array for easier indexing
arr = np.array(df)
trace1 = go.Scatterpolar(r=arr[0], theta=theta, name=names[0])
trace2 = go.Scatterpolar(r=arr[1], theta=theta, name=names[1])
trace3 = go.Scatterpolar(r=arr[2], theta=theta, name=names[2])
trace4 = go.Scatterpolar(r=arr[3], theta=theta, name=names[3])
layout = go.Layout(polar=dict(radialaxis=dict(visible=True, range=[0,1])), showlegend=True)
fig = go.Figure(data=[trace1, trace2, trace3, trace4], layout=layout)
off.plot(fig)Overall, for simple charts Excel remains the quickest tool, but for advanced, customizable visualizations and large‑scale data processing, Python with pandas and Plotly offers far greater flexibility and performance.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.