Master Pandas: From Import to Data Cleaning in One Comprehensive Guide
This tutorial walks through essential pandas operations—including importing modules, building a sample shopping dataset, reading and writing CSV files, inspecting data structures, and performing thorough data cleaning such as handling missing values, trimming spaces, case conversion, replacements, deletions, duplicate removal, type casting, and column renaming—complete with code snippets and visual results.
To deepen data‑analysis skills, this article summarizes the most commonly used pandas functions, providing clear explanations, official documentation links, and a mind‑map for quick reference.
1. Import Modules
import pandas as pd # pandas
import numpy as np # numpy2. Create Dataset and Read/Write
2.1 Create Dataset
A sample supermarket shopping dataset is constructed with columns: id, date, money, product, department, origin.
# List and dict can be passed to DataFrame; here a dict is used:
data = pd.DataFrame({
"id": np.arange(101, 111),
"date": pd.date_range(start="20200310", periods=10),
"money": [5, 4, 65, -10, 15, 20, 35, 16, 6, 20],
"product": ['苏打水','可乐','牛肉干','老干妈','菠萝','冰激凌','洗面奶','洋葱','牙膏','薯片'],
"department": ['饮料','饮料','零食','调味品','水果',np.nan,'日用品','蔬菜','日用品','零食'],
"origin": ['China',' China','America','China','Thailand','China','america','China','China','Japan']
})
data # display the datasetResult:
2.2 Write and Read CSV
data.to_csv("shopping.csv", index=False) # Do not write index
data = pd.read_csv("shopping.csv")3. Data Inspection
3.1 Basic Information
data.shape # rows, columns
data.dtypes # data types of all columns
data['id'].dtype # data type of a specific column
data.ndim # number of dimensions
data.index # row index
data.columns # column index
data.values # underlying numpy array3.2 Overall View
data.head() # first 5 rows
data.tail() # last 5 rows
data.info() # summary of index, dtypes, non‑null counts, memory usage
data.describe()# statistical summary4. Data Cleaning
4.1 Detect Anomalies
Iterate through columns to list unique values and spot issues such as negative money, missing department, and inconsistent case in origin.
for col in data:
print(col + ": " + str(data[col].unique())) # show unique valuesResult shows a negative value in money, a NaN in department, and case mismatches in origin.
4.2 Missing‑Value Handling
4.2.1 Detection
data.isnull() # whole DataFrame
data['department'].isnull() # specific column4.2.2 Summarize Missing Values
data.isnull().sum().sort_values(ascending=False)4.2.3 Fill Missing Values
# Forward fill for department
data['department'].fillna(method="ffill", inplace=True)
# Backward fill for department
data['department'].fillna(method="bfill", inplace=True)
# Fill with a specific value
data['department'].fillna(value="冷冻食品", inplace=True)4.3 Trim Spaces
for col in data:
if pd.api.types.is_object_dtype(data[col]):
data[col] = data[col].str.strip()
# Verify
data['origin'].unique()Result: array(['China', 'America', 'Thailand', 'america', 'Japan'], dtype=object)
4.4 Case Conversion
data['origin'].str.title() # Capitalize first letters
data['origin'].str.capitalize()
data['origin'].str.upper() # Upper case
data['origin'].str.lower() # Lower case4.5 Replace Values
# Correct case in origin
data['origin'].replace("america", "America", inplace=True)
# Replace negative money with NaN, then fill with mean
data['money'].replace(-10, np.nan, inplace=True)
data['money'].replace(np.nan, data['money'].mean(), inplace=True)4.6 Delete Rows
Method 1 – filter rows:
data1 = data[data.origin != 'American']
data2 = data[(data != 'Japan').all(axis=1)]Method 2 – drop duplicates:
# Keep first occurrence
data['origin'].drop_duplicates()
# Keep last occurrence
data['origin'].drop_duplicates(keep='last')4.7 Type Conversion
data['id'].astype('str') # convert id column to string4.8 Rename Columns
data.rename(columns={'id':'ID', 'origin':'产地'}, inplace=True)Mind‑Map Overview
References
pandas official documentation
Pandas usage summary articles
Pandas text‑data methods
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
