Master Pandas: From Data Import to Advanced Manipulation in Python
This tutorial walks you through pandas fundamentals—including reading CSV/Excel files, creating Series and DataFrames, performing basic operations, cleaning data, using loc/iloc indexing, grouping, concatenating, merging, and handling time series—providing code examples and visual outputs for each step.
Pandas Introduction
This article provides a comprehensive tutorial on using pandas for data manipulation in Python, covering data import, creation of Series and DataFrames, basic operations, cleaning, indexing, grouping, aggregation, concatenation, merging, and time‑series handling.
Data Import
Read CSV and Excel files:
pd.read_csv('file.csv') pd.read_excel('file.xlsx')Directory Structure
Generate data tables
Basic table operations
Data cleaning
Time series
1. Generating Data Tables
1.1 Data Reading
Typical data sources are CSV or Excel files. Example for CSV:
pd.read_csv('zlJob.csv')1.2 Creating Data
Create a Series (1‑D) and a DataFrame (2‑D):
s = pd.Series([1, 2, 3, 4, 5])df2 = pd.DataFrame({
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo"
})2. Basic Table Operations
2.1 Viewing Data
Show first five rows:
data.head() # default is 5 rowsBasic information:
data.shape(990, 9)
data.dtypesCheck for missing values in a column:
data['name'].isnull()2.2 Row and Column Operations
Add a new row:
dic = {
'name': '前端开发',
'salary': '2万-2.5万',
'company': '上海科技有限公司',
'adress': '上海',
'eduBack': '本科',
'companyType': '民营',
'scale': '1000-10000人',
'info': '小程序'
}
df = pd.Series(dic)
df.name = 38738
data = data.append(df)
data.tail()Delete a row: data = data.drop([990]) Add a new column: data['xx'] = range(len(data)) Delete a column (axis=1 means column):
data = data.drop('序号', axis=1)2.3 Indexing
Label‑based indexing with loc:
data.loc[10, 'salary'] # returns salary at index 10 data.loc[:, 'name'][:5]Position‑based indexing with iloc:
data.iloc[2] # third rowdata.iloc[:5, :4] # first 5 rows, first 4 columns2.4 Hierarchical Indexing
Series with multi‑level index:
s = pd.Series(np.arange(1, 10), index=[list('aaabbccdd'), [1,2,3,1,2,3,1,2,3]])DataFrame with multi‑level index:
df = pd.DataFrame(np.arange(12).reshape(4,3), index=[["a","a","b","b"],[1,2,1,2]], columns=[["X","X","Y"],["m","n","t"]])3. Data Pre‑processing
3.1 Handling Missing Values
Create a simple table with a missing value:
df = pd.DataFrame({
'state': ['a','b','c','d'],
'year': [1991,1992,1993,1994],
'pop': [6.0,7.0,8.0,np.NaN]
})
print(df)Check missing values:
df['pop'].isnull()Fill missing values with 0:
df['pop'].fillna(0, inplace=True)
print(df)Drop rows where all values are missing:
data.dropna(how='all')3.2 String Processing
df['A'] = df['A'].str.strip()
df['A'] = df['A'].str.lower()3.3 Duplicate Handling
df['A'] = df['A'].drop_duplicates()
# keep last occurrence
df['A'] = df['A'].drop_duplicates(keep='last')
df['A'].replace('sh', 'shanghai')4. Table Operations
Grouping
Group by a column (e.g., job name):
group = data.groupby(data['name'])
print(group)Group objects can be used for aggregation such as mean or sum.
Aggregation (concat)
pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)Key parameters are explained in the pandas documentation.
Merge
pd.merge(left, right, how='inner', on='key')
# left and right are DataFrames with a common column 'key'5. Time Series
5.1 Generating a Date Range
date = pd.period_range(start='20210913', end='20210919')
print(date)PeriodIndex(['2021-09-13', '2021-09-14', '2021-09-15', '2021-09-16', '2021-09-17', '2021-09-18', '2021-09-19'], dtype='period[D]', freq='D')
5.2 Using Time Series in pandas
index = pd.period_range(start='20210913', end='20210918')
df = pd.DataFrame(np.arange(24).reshape((6,4)), index=index)
print(df)6. Conclusion
This article demonstrated common pandas operations on the sample file zlJob.csv, including data import, creation, inspection, cleaning, indexing, grouping, concatenation, merging, and time‑series handling. For more detailed explanations, refer to the official pandas documentation.
https://pandas.pydata.org/pandas-docs/stable/getting_started/index.htmlSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
