Fundamentals 8 min read

Using Python Pandas for Excel‑Like Data Processing: 14 Common Operations

This article demonstrates how to replace typical Excel tasks such as VLOOKUP, pivot tables, duplicate removal, missing‑value handling, multi‑condition filtering, and grouping with concise Python pandas code, providing step‑by‑step examples and practical tips for efficient data analysis.

Python Programming Learning Circle

Jun 14, 2022

Using Python Pandas for Excel‑Like Data Processing: 14 Common Operations

The author, motivated by learning Python, shows how to handle Excel sales data entirely with pandas to strengthen Python skills and data‑processing capability.

1. VLOOKUP equivalent – Split the original table into two DataFrames and use df_c=pd.merge(df1,df2,on="订单明细号",how="left") to bring the profit column from the second table into the first.

2. Pivot table – Summarise profit by region and salesperson with

pd.pivot_table(sale, index="地区名称", columns="业务员名称", values="利润", aggfunc=[np.sum, np.mean])

3. Compare two columns – Create a duplicate column, modify the first ten rows, and find mismatches using

result=sale.loc[sale["订单明细号"].isin(sale["订单明细号2"])==False]

4. Remove duplicates – Drop duplicate salesperson codes with sale.drop_duplicates("业务员编码", inplace=True).

5. Missing‑value handling – Detect missing values, fill them with zero using sale["客户名称"]=sale["客户名称"].fillna(0), or drop rows lacking a customer code with sale.dropna(subset=["客户编码"]).

6. Multi‑condition filtering – Example: find orders in Beijing by salesperson "张爱" with amount > 5000 using

sale.loc[(sale["地区名称"]=="北京") & (sale["业务员名称"]=="张爱") & (sale["订单金额"]>5000)]

7. Fuzzy filtering – Select rows where the product name contains "三星" or "索尼" with sale.loc[sale["存货名称"].str.contains("三星|索尼")].

8. Group‑by aggregation – Compute total profit per region and salesperson:

sale.groupby(["地区名称","业务员名称"])["利润"].sum()

9. Binning (categorical grouping) – Create profit bins and labels, then assign groups using

sale_area["分组"]=pd.cut(sale_area["利润"], bins, labels=groups)

10. Business‑logic labeling – Tag high‑margin products as "优质商品" and low‑margin ones as "一般商品" with

sale.loc[(sale["利润"]/sale["订单金额"])>0.3, "label"]="优质商品"

and similar for <0.05.

The article concludes that both Excel and Python are valuable tools; the choice depends on the specific task, and a good analyst selects the most efficient method.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Excel Pandas data-analysis VLOOKUP data-cleaning data-manipulation

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.