Fundamentals 15 min read

Unlock Excel‑Like Data Power with Pandas: A Hands‑On Python Guide

This comprehensive tutorial translates the official Pandas documentation into Chinese, then walks readers through DataFrames, Series, indexing, I/O operations, data manipulation, string handling, merging, and advanced tips, providing clear code examples for each concept.

Python Crawling & Data Mining

Aug 31, 2021

Unlock Excel‑Like Data Power with Pandas: A Hands‑On Python Guide

Importing Libraries

Typical imports for Pandas tutorials:

import pandas as pd
import numpy as np

Data Structures

DataFrame

A Pandas DataFrame behaves like an Excel worksheet; each DataFrame is independent.

Series

A Series represents a single column of a DataFrame, similar to a spreadsheet column.

Index

Both DataFrames and Series have an index labeling rows. If not specified, a default RangeIndex (0, 1, 2, …) is used.

Copy vs. In‑place Operations

Most Pandas operations return a copy; to keep changes, assign the result to a new variable or use inplace=True.

sorted_df = df.sort_values("col1")
# or overwrite
df = df.sort_values("col1")
# in‑place
df.sort_values("col1", inplace=True)

Data Input and Output

Constructing a DataFrame from values

For small datasets, a Python dictionary works well:

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
print(df)

Reading external data

Read a CSV file from a URL or local path:

url = "https://raw.github.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/tips.csv"
tips = pd.read_csv(url)
print(tips)

Read a CSV with custom separator and no header:

tips = pd.read_csv("tips.csv", sep="\t", header=None)
# read_table is an alias for tab‑separated files
tips = pd.read_table("tips.csv", header=None)

Excel files

Write a DataFrame to Excel and read it back:

tips.to_excel("./tips.xlsx")
tips_df = pd.read_excel("./tips.xlsx", index_col=0)
print(tips_df)

Limiting output

Use head() or tail() to control displayed rows:

tips.head(5)

Exporting data

DataFrames can be saved as Excel, CSV, or many other formats.

Data Manipulation

Column operations

Perform vectorized operations on entire columns and assign new columns:

tips["total_bill"] = tips["total_bill"] - 2
tips["new_bill"] = tips["total_bill"] / 2
print(tips)

Filtering

Use boolean indexing to filter rows: tips[tips["total_bill"] > 10] Count values and filter by condition:

is_dinner = tips["time"] == "Dinner"
print(is_dinner.value_counts())
print(tips[is_dinner])

If/Then logic

Create a new column based on a condition using np.where:

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
print(tips)

Date functions

Parse and output dates, similar to spreadsheet date handling:

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips["date1"].dt.to_period("M")
print(tips[["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]])

Column selection

Select, rename, or drop columns:

# keep specific columns
selected = tips[["sex", "total_bill", "tip"]]
# drop a column
tips = tips.drop("sex", axis=1)
# rename a column
tips = tips.rename(columns={"total_bill": "total_bill_2"})

Sorting

Sort by one or more columns:

tips = tips.sort_values(["sex", "total_bill"])
print(tips)

String Handling

Length

Find string length and trim trailing spaces:

tips["time"].str.len()
tips["time"].str.rstrip().str.len()

Find substring position

tips["sex"].str.find("ale")

Extract substring by position

tips["sex"].str[0:1]

Extract nth word

firstlast = pd.DataFrame({"String": ["John Smith", "Jane Cook"]})
firstlast["First_Name"] = firstlast["String"].str.split(" ", expand=True)[0]
firstlast["Last_Name"] = firstlast["String"].str.rsplit(" ", expand=True)[0]
print(firstlast)

Case conversion

firstlast["upper"] = firstlast["string"].str.upper()
firstlast["lower"] = firstlast["string"].str.lower()
firstlast["title"] = firstlast["string"].str.title()
print(firstlast)

Merging

Example DataFrames:

df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})

Merge operations:

inner_join = df1.merge(df2, on=["key"], how="inner")
left_join = df1.merge(df2, on=["key"], how="left")
right_join = df1.merge(df2, on=["key"], how="right")
outer_join = df1.merge(df2, on=["key"], how="outer")

Other Tips

Fill handle (auto‑fill series)

df = pd.DataFrame({"AAA": [1]*8, "BBB": list(range(8))})
series = list(range(1,5))
df.loc[2:5, "AAA"] = series
print(df)

Drop duplicates

df = pd.DataFrame({"class": ["A","A","A","B","C","D"],
                   "student_count": [42,35,42,50,47,45],
                   "all_pass": ["Yes","Yes","Yes","No","No","Yes"]})
print(df.drop_duplicates())
print(df.drop_duplicates(["class", "student_count"]))

Pivot table

pivot = pd.pivot_table(tips, values="tip", index=["size"], columns=["sex"], aggfunc=np.average)
print(pivot)

Add a row

new_row = {"class": "E", "student_count": 51, "all_pass": True}
df = df.append(new_row, ignore_index=True)
print(df)

Find and replace

# find rows containing "S" in the "day" column
mask = tips["day"].str.contains("S")
print(tips[mask])
# replace values
tips = tips.replace("Thu", "Thursday")
print(tips)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python tutorial dataframe Pandas data-analysis

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.