Master Pandas: From Data Loading to Advanced Manipulation
This comprehensive Pandas tutorial walks you through loading CSV and Excel files, creating Series and DataFrames, performing basic operations, cleaning data, handling missing values, working with hierarchical indexes, grouping, merging, concatenating, and applying time‑series techniques, all illustrated with clear code examples and screenshots.
Pandas Introduction
This article presents a step‑by‑step Pandas tutorial covering data loading, creation, basic manipulation, cleaning, time‑series handling, and summary operations, using the sample file zlJob.csv as the source dataset.
Generating DataFrames
1.1 Reading Data
Read CSV files with pd.read_csv() and Excel files with pd.read_excel().
pd.read_csv('path/to/file.csv') pd.read_excel('path/to/file.xlsx')1.2 Creating Data
Create a Series (one‑dimensional) and a DataFrame (two‑dimensional).
s = pd.Series([1, 2, 3, 4, 5]) df2 = pd.DataFrame({
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo"
})Basic DataFrame Operations
2.1 Viewing Data
Show the first five rows with data.head(). Get shape with data.shape and data types with data.dtypes. Check for nulls using data['name'].isnull().
data.head() data.shape data.dtypes data['name'].isnull()2.2 Row and Column Operations
Add a new row by appending a Series, delete a row with data.drop([990]), add a column with data['xx'] = range(len(data)), and delete a column with data.drop('序号', axis=1).
dic = {
'name': '前端开发',
'salary': '2万-2.5万',
'company': '上海科技有限公司',
'address': '上海',
'eduBack': '本科',
'companyType': '民营',
'scale': '1000-10000人',
'info': '小程序'
}
new_row = pd.Series(dic)
new_row.name = 38738
data = data.append(new_row) data = data.drop([990]) data['xx'] = range(len(data)) data = data.drop('序号', axis=1)2.3 Indexing
Label‑based indexing with .loc and position‑based indexing with .iloc. Examples include selecting a single cell, a column slice, and multi‑column slices.
data.loc[10, 'salary'] data.loc[:, 'name'][:5] data.iloc[2] # third row data.iloc[:5, :4] # first 5 rows, first 4 columns2.4 Hierarchical Indexing
Create a multi‑level Series and DataFrame.
s = pd.Series(np.arange(1, 10), index=[list('aaabbccdd'), [1,2,3,1,2,3,1,2,3]]) df = pd.DataFrame(np.arange(12).reshape(4,3),
index=[["a","a","b","b"],[1,2,1,2]],
columns=[["X","X","Y"],["m","n","t"]])Data Pre‑processing
3.1 Handling Missing Values
Create a simple DataFrame, detect missing values with .isnull(), fill them with .fillna(0, inplace=True), and drop rows or columns that are entirely null.
df = pd.DataFrame({
'state': ['a','b','c','d'],
'year': [1991,1992,1993,1994],
'pop': [6.0,7.0,8.0,np.NaN]
})
df['pop'].isnull()
df['pop'].fillna(0, inplace=True)
df.dropna(how='all')3.2 String Processing
Trim spaces and convert case.
df['A'] = df['A'].str.strip()
df['A'] = df['A'].str.lower()3.3 Duplicate Handling
Remove duplicate values, keeping either the last or first occurrence, and replace specific values.
df['A'] = df['A'].drop_duplicates()
df['A'] = df['A'].drop_duplicates(keep='last')
df['A'].replace('sh', 'shanghai')DataFrame Operations
Grouping
Group by a column (e.g., job name) and perform aggregate calculations.
group = data.groupby(data['name'])Concatenation
Combine multiple DataFrames vertically.
frames = [df1, df2, df3]
result = pd.concat(frames)Merge
Merge two DataFrames on a common key.
result = pd.merge(left, right, on='key')Time Series
5.1 Generating a Period Range
date = pd.period_range(start='20210913', end='20210919')5.2 Using Time Series in a DataFrame
index = pd.period_range(start='20210913', end='20210918')
df = pd.DataFrame(np.arange(24).reshape((6,4)), index=index)Summary
The tutorial demonstrates common Pandas operations—loading data, creating Series/DataFrames, basic inspection, cleaning, hierarchical indexing, grouping, merging, concatenating, and time‑series handling—using the sample zlJob.csv. For detailed explanations, refer to the official Pandas documentation.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
