Comprehensive Guide to Pandas Data Processing in Python
This tutorial provides a detailed overview of Pandas, covering its core data structures, data import/export, selection, cleaning, aggregation, merging, and a practical sales analysis example, with complete code snippets for each operation.
Pandas is one of the most powerful data processing libraries in Python, widely used for data cleaning, transformation, analysis, and visualization. This chapter details core data processing techniques, including data reading, filtering, aggregation, merging, and handling missing values.
1. Introduction to Pandas Data Structures
Pandas mainly provides two data structures:
Series : a one‑dimensional array, similar to a labeled list.
DataFrame : a two‑dimensional table structure, similar to Excel or an SQL table, and the most commonly used structure.
1.1 Creating a DataFrame
import pandas as pd
# Create from a dictionary
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "London", "Tokyo"]
}
df = pd.DataFrame(data)
print(df)Output:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Tokyo2. Data Reading and Export
Pandas supports reading and storing multiple data formats:
2.1 Reading CSV/Excel/SQL Data
# Read CSV
df = pd.read_csv("data.csv")
# Read Excel
df = pd.read_excel("data.xlsx")
# Read from an SQLite database
import sqlite3
conn = sqlite3.connect("database.db")
df = pd.read_sql("SELECT * FROM users", conn)2.2 Exporting Data
# Save as CSV
df.to_csv("output.csv", index=False)
# Save as Excel
df.to_excel("output.xlsx", index=False)3. Data Selection and Query
3.1 Selecting Columns and Rows
# Select a single column
ages = df["Age"]
# Select multiple columns
subset = df[["Name", "City"]]
# Filter rows by condition
young_people = df[df["Age"] < 30]3.2 Using loc and iloc
loc : label‑based selection.
iloc : integer‑position based selection.
# Select the first row (label based)
row = df.loc[0]
# Select the first two rows (position based)
rows = df.iloc[0:2]4. Data Cleaning and Processing
4.1 Handling Missing Values
# Check missing values
print(df.isnull().sum())
# Drop missing values
df_cleaned = df.dropna()
# Fill missing values with 0
df_filled = df.fillna(0)4.2 Removing Duplicates
df.drop_duplicates(inplace=True)4.3 Data Transformation
# Convert strings to lowercase
df["Name"] = df["Name"].str.lower()
# Normalize numeric column
df["Age"] = (df["Age"] - df["Age"].mean()) / df["Age"].std()5. Data Aggregation and Grouping
5.1 groupby Aggregation
# Group by city and compute average age
grouped = df.groupby("City")["Age"].mean()
print(grouped)5.2 Pivot Table
pivot_table = df.pivot_table(index="City", values="Age", aggfunc="mean")
print(pivot_table)6. Data Merging and Joining
6.1 concat Merge
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]})
combined = pd.concat([df1, df2], ignore_index=True)6.2 merge Join (SQL‑like)
left = pd.DataFrame({"key": ["A", "B"], "value": [1, 2]})
right = pd.DataFrame({"key": ["A", "B"], "value": [3, 4]})
merged = pd.merge(left, right, on="key", suffixes=("_left", "_right"))7. Practical Example: Sales Data Analysis
Assume there is a sales data file sales.csv , we can perform the following analysis:
sales = pd.read_csv("sales.csv")
# Total revenue per product
product_sales = sales.groupby("Product")["Revenue"].sum().sort_values(ascending=False)
# Monthly sales trend
sales["Date"] = pd.to_datetime(sales["Date"])
monthly_sales = sales.resample("M", on="Date")["Revenue"].sum()
# Visualization
import matplotlib.pyplot as plt
monthly_sales.plot(kind="line", title="Monthly Sales Trend")
plt.show()Conclusion
Pandas core operations: data reading, selection, cleaning, aggregation, merging.
Key functions: groupby , pivot_table , merge , dropna .
Applicable scenarios: data analysis, data cleaning, business intelligence (BI), machine‑learning preprocessing.
Mastering Pandas data processing techniques can greatly improve data analysis efficiency, providing a solid foundation for subsequent visualization (Matplotlib/Seaborn) and machine learning (Scikit‑learn).
Java learning materials download
C language learning materials download
Frontend learning materials download
C++ learning materials download
PHP learning materials download
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.