Master Pandas: From CSV to Advanced Data Manipulation in Python
This comprehensive tutorial walks you through pandas fundamentals—including reading CSV/Excel files, creating Series and DataFrames, performing basic operations, cleaning data, indexing, grouping, concatenation, merging, and handling time series—using clear examples and code snippets.
Pandas Introduction
This article provides a detailed overview of basic pandas operations using the sample file zlJob.csv.
Pandas Basics
1.1 Reading Data
Read CSV and Excel files:
pd.read_csv() pd.read_excel()1.2 Creating Data
Create Series and DataFrames:
s = pd.Series([1, 2, 3, 4, 5])df2 = pd.DataFrame({
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo"
})Basic DataFrame Operations
2.1 Viewing Data
Show first rows, shape, dtypes, and check for null values:
data.head() # default first 5 rows data.shape data.dtypes data['name'].isnull()2.2 Row and Column Operations
Add and delete rows/columns:
dic = {
'name': '前端开发',
'salary': '2万-2.5万',
'company': '上海科技有限公司',
'adress': '上海',
'eduBack': '本科',
'companyType': '民营',
'scale': '1000-10000人',
'info': '小程序'
}
df = pd.Series(dic)
df.name = 38738
data = data.append(df)
data.tail() data = data.drop([990]) data["xx"] = range(len(data)) data = data.drop('序号', axis=1)Note: axis=1 deletes a column.
2.3 Indexing
Label‑based loc and position‑based iloc examples:
data.loc[10, 'salary'] data.loc[:, 'name'][:5] data.loc[:, ['name', 'salary']][:5] data.iloc[2] # third row data.iloc[:5] data.iloc[:5, :4] # first 5 rows, first 4 columns2.4 Hierarchical Indexing
Series and DataFrame multi‑index examples:
s = pd.Series(np.arange(1, 10), index=[list('aaabbccdd'), [1,2,3,1,2,3,1,2,3]])df = pd.DataFrame(np.arange(12).reshape(4,3),
index=[["a","a","b","b"],[1,2,1,2]],
columns=[["X","X","Y"],["m","n","t"]])Data Preprocessing
3.1 Handling Missing Values
df = pd.DataFrame({
'state': ['a','b','c','d'],
'year': [1991,1992,1993,1994],
'pop': [6.0,7.0,8.0, np.NaN]
})
df['pop'].isnull()
df['pop'].fillna(0, inplace=True)
data.dropna(how='all')
data.dropna(axis=1)
data.dropna(axis=0, subset=['Age','Sex'])3.2 String Processing
df['A'] = df['A'].str.strip()
df['A'] = df['A'].str.lower()3.3 Duplicate Handling
df['A'] = df['A'].drop_duplicates()
df['A'] = df['A'].drop_duplicates(keep='last')
df['A'].replace('sh', 'shanghai')Grouping and Aggregation
group = data.groupby(data['name']) # group by job titleThe resulting group object can be used for aggregations such as mean or sum.
pd.concat([df1, df2, df3]) pd.merge(left, right, on='key')Time Series
5.1 Generating a Period Range
date = pd.period_range(start='20210913', end='20210919')5.2 Time Series DataFrame
index = pd.period_range(start='20210913', end='20210918')
df = pd.DataFrame(np.arange(24).reshape((6,4)), index=index)Conclusion
This article demonstrates common pandas data‑processing operations based on the zlJob.csv dataset; for more detailed explanations refer to the official pandas documentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
