Fundamentals 15 min read

Using Python pandas to Replicate Excel Functions: VLOOKUP, Data I/O, Pivot Tables, and Plotting

This article demonstrates how to replace Excel's common data‑analysis tasks—such as VLOOKUP, data import/export, pivot tables, and charting—by using Python's pandas library together with Plotly for visualization, providing code examples and performance tips for each scenario.

Python Programming Learning Circle

Oct 10, 2022

Using Python pandas to Replicate Excel Functions: VLOOKUP, Data I/O, Pivot Tables, and Plotting

Historically, business analysis relied on Excel, but as data volumes grew, analysts shifted to Python and R, adopting the term "Business Analytics". Both tools follow the DRY principle, aiming to replace repetitive manual operations.

Python data analysis centers on the pandas package, whose ecosystem now rivals Excel. The most frequently used I/O functions are read_csv, read_excel, to_csv, and to_excel, each offering extensive parameters for customized import and export.

Python‑Excel Interaction

Python and Excel interaction

VLOOKUP equivalent

Pivot tables

Plotting

Typical usage begins with importing pandas and configuring display options:

import numpy as np
import pandas as pd
pd.set_option('max_columns', 10)
pd.set_option('max_rows', 20)
pd.set_option('display.float_format', lambda x: '%.2f' % x)  # disable scientific notation

VLOOKUP Cases

Case 1 : Map numeric scores to letter grades using a custom function and apply:

df = pd.read_excel("test.xlsx", sheet_name=0)

def grade_to_point(x):
    if x >= 90:
        return 'A'
    elif x >= 80:
        return 'B'
    elif x >= 70:
        return 'C'
    elif x >= 60:
        return 'D'
    else:
        return 'E'

df['等级'] = df['语文'].apply(grade_to_point)
print(df)

Case 2 : Cross‑sheet lookup for depreciation amounts using merge:

df1 = pd.read_excel("test.xlsx", sheet_name='折旧明细表')
df2 = pd.read_excel("test.xlsx", sheet_name=1)
result = df2.merge(df1[['编号', '月折旧额']], how='left', on='编号')
print(result)

Case 3 : Approximate lookup with custom logic and pandas indexing.

df1 = pd.read_excel("test.xlsx", sheet_name='折旧明细表')
df3 = pd.read_excel("test.xlsx", sheet_name=3)
df3['月折旧额'] = 0
for i in range(len(df3['资产名称'])):
    df3['月折旧额'][i] = df1[df1['资产名称'].map(lambda x: df3['资产名称'][i] in x)]['月折旧额']
print(df3)

Case 4 : Auto‑populate employee information based on employee ID using a merge:

df4 = pd.read_excel("test.xlsx", sheet_name='员工基本信息表')
df5 = pd.read_excel("test.xlsx", sheet_name='请假统计表')
merged = df5.merge(df4[['工号', '姓名', '部门', '职务', '入职日期']], on='工号')
print(merged)

Case 5 : Retrieve all consumption records for a specific person:

df6 = pd.read_excel("test.xlsx", sheet_name='消费额')
result = df6[df6['姓名'] == '张一'][['姓名', '消费额']]
print(result)

Pivot Tables and Groupby

Excel pivot tables can be reproduced with pandas groupby or pivot_table. Example using groupby to aggregate sales, cost, and profit by month and region:

df = pd.read_excel('test.xlsx', sheet_name='销售统计表')
df['订购月份'] = df['订购日期'].apply(lambda x: x.month)
agg = df.groupby(['订购月份', '所属区域'])[['销售额', '成本']].sum()
agg['利润'] = agg['销售额'] - agg['成本']
print(agg)

The same result can be achieved with pivot_table:

pivot = pd.pivot_table(df, values=['销售额', '成本'], index=['订购月份', '所属区域'], aggfunc='sum')
pivot['利润'] = pivot['销售额'] - pivot['成本']
print(pivot)

Visualization with Plotly

Bar charts and radar charts are created using Plotly's graph_objs interface. Example bar chart code:

import plotly.offline as off
import plotly.graph_objs as go
off.init_notebook_mode()

df = pd.read_excel('plot.xlsx', sheet_name='高等教育入学率')
trace1 = go.Bar(x=df['国家'], y=df[1995], name='1995', opacity=0.6, marker=dict(color='powderblue'))
trace2 = go.Bar(x=df['国家'], y=df[2005], name='2005', opacity=0.6, marker=dict(color='aliceblue'))
trace3 = go.Bar(x=df['国家'], y=df[2014], name='2014', opacity=0.6, marker=dict(color='royalblue'))
layout = go.Layout(barmode='group')
fig = go.Figure(data=[trace1, trace2, trace3], layout=layout)
off.plot(fig)

Radar chart example:

df = pd.read_excel('plot.xlsx', sheet_name='政治治理')
theta = df.columns.tolist()
theta.append(theta[0])
names = df.index.tolist()
# Convert to numpy array for easier indexing
arr = np.array(df)
trace1 = go.Scatterpolar(r=arr[0], theta=theta, name=names[0])
trace2 = go.Scatterpolar(r=arr[1], theta=theta, name=names[1])
trace3 = go.Scatterpolar(r=arr[2], theta=theta, name=names[2])
trace4 = go.Scatterpolar(r=arr[3], theta=theta, name=names[3])
layout = go.Layout(polar=dict(radialaxis=dict(visible=True, range=[0,1])), showlegend=True)
fig = go.Figure(data=[trace1, trace2, trace3, trace4], layout=layout)
off.plot(fig)

Overall, for simple charts Excel remains the quickest tool, but for advanced, customizable visualizations and large‑scale data processing, Python with pandas and Plotly offers far greater flexibility and performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

groupby data-analysis pivot table VLOOKUP

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.