Unlock Excel‑Like Data Power with Pandas: A Hands‑On Python Guide
This comprehensive tutorial translates the official Pandas documentation into Chinese, then walks readers through DataFrames, Series, indexing, I/O operations, data manipulation, string handling, merging, and advanced tips, providing clear code examples for each concept.
Importing Libraries
Typical imports for Pandas tutorials:
import pandas as pd
import numpy as npData Structures
DataFrame
A Pandas DataFrame behaves like an Excel worksheet; each DataFrame is independent.
Series
A Series represents a single column of a DataFrame, similar to a spreadsheet column.
Index
Both DataFrames and Series have an index labeling rows. If not specified, a default RangeIndex (0, 1, 2, …) is used.
Copy vs. In‑place Operations
Most Pandas operations return a copy; to keep changes, assign the result to a new variable or use inplace=True.
sorted_df = df.sort_values("col1")
# or overwrite
df = df.sort_values("col1")
# in‑place
df.sort_values("col1", inplace=True)Data Input and Output
Constructing a DataFrame from values
For small datasets, a Python dictionary works well:
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
print(df)Reading external data
Read a CSV file from a URL or local path:
url = "https://raw.github.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/tips.csv"
tips = pd.read_csv(url)
print(tips)Read a CSV with custom separator and no header:
tips = pd.read_csv("tips.csv", sep="\t", header=None)
# read_table is an alias for tab‑separated files
tips = pd.read_table("tips.csv", header=None)Excel files
Write a DataFrame to Excel and read it back:
tips.to_excel("./tips.xlsx")
tips_df = pd.read_excel("./tips.xlsx", index_col=0)
print(tips_df)Limiting output
Use head() or tail() to control displayed rows:
tips.head(5)Exporting data
DataFrames can be saved as Excel, CSV, or many other formats.
Data Manipulation
Column operations
Perform vectorized operations on entire columns and assign new columns:
tips["total_bill"] = tips["total_bill"] - 2
tips["new_bill"] = tips["total_bill"] / 2
print(tips)Filtering
Use boolean indexing to filter rows: tips[tips["total_bill"] > 10] Count values and filter by condition:
is_dinner = tips["time"] == "Dinner"
print(is_dinner.value_counts())
print(tips[is_dinner])If/Then logic
Create a new column based on a condition using np.where:
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
print(tips)Date functions
Parse and output dates, similar to spreadsheet date handling:
tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips["date1"].dt.to_period("M")
print(tips[["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]])Column selection
Select, rename, or drop columns:
# keep specific columns
selected = tips[["sex", "total_bill", "tip"]]
# drop a column
tips = tips.drop("sex", axis=1)
# rename a column
tips = tips.rename(columns={"total_bill": "total_bill_2"})Sorting
Sort by one or more columns:
tips = tips.sort_values(["sex", "total_bill"])
print(tips)String Handling
Length
Find string length and trim trailing spaces:
tips["time"].str.len()
tips["time"].str.rstrip().str.len()Find substring position
tips["sex"].str.find("ale")Extract substring by position
tips["sex"].str[0:1]Extract nth word
firstlast = pd.DataFrame({"String": ["John Smith", "Jane Cook"]})
firstlast["First_Name"] = firstlast["String"].str.split(" ", expand=True)[0]
firstlast["Last_Name"] = firstlast["String"].str.rsplit(" ", expand=True)[0]
print(firstlast)Case conversion
firstlast["upper"] = firstlast["string"].str.upper()
firstlast["lower"] = firstlast["string"].str.lower()
firstlast["title"] = firstlast["string"].str.title()
print(firstlast)Merging
Example DataFrames:
df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})Merge operations:
inner_join = df1.merge(df2, on=["key"], how="inner")
left_join = df1.merge(df2, on=["key"], how="left")
right_join = df1.merge(df2, on=["key"], how="right")
outer_join = df1.merge(df2, on=["key"], how="outer")Other Tips
Fill handle (auto‑fill series)
df = pd.DataFrame({"AAA": [1]*8, "BBB": list(range(8))})
series = list(range(1,5))
df.loc[2:5, "AAA"] = series
print(df)Drop duplicates
df = pd.DataFrame({"class": ["A","A","A","B","C","D"],
"student_count": [42,35,42,50,47,45],
"all_pass": ["Yes","Yes","Yes","No","No","Yes"]})
print(df.drop_duplicates())
print(df.drop_duplicates(["class", "student_count"]))Pivot table
pivot = pd.pivot_table(tips, values="tip", index=["size"], columns=["sex"], aggfunc=np.average)
print(pivot)Add a row
new_row = {"class": "E", "student_count": 51, "all_pass": True}
df = df.append(new_row, ignore_index=True)
print(df)Find and replace
# find rows containing "S" in the "day" column
mask = tips["day"].str.contains("S")
print(tips[mask])
# replace values
tips = tips.replace("Thu", "Thursday")
print(tips)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
