Master Pandas: Essential Data Manipulation Techniques for Beginners
This comprehensive tutorial walks you through pandas basics, including reading CSV and Excel files, creating Series and DataFrames, performing data inspection, cleaning, indexing, hierarchical indexing, time‑series handling, grouping, aggregation, concatenation, merging, and practical code examples with visual outputs.
Pandas Introduction
The article presents a step‑by‑step guide to using pandas for data analysis, based on the sample file zlJob.csv. It covers data import, creation, basic operations, cleaning, and advanced features.
Data Import
Read CSV: pd.read_csv() Read Excel:
pd.read_excel()Creating Data Structures
Create a Series: s = pd.Series([1, 2, 3, 4, 5]) Create a DataFrame:
df2 = pd.DataFrame({
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo"
})Basic Data Operations
View first rows: data.head() Shape and dtypes: data.shape and data.dtypes Check for nulls:
data['name'].isnull()Row and Column Manipulation
Add a row:
dic = {'name':'前端开发','salary':2.0,'company':'上海科技有限公司','adress':'上海','eduBack':'本科','companyType':'民营','scale':1000,'info':'小程序'}
df = pd.Series(dic)
df.name = 38738
data = data.append(df)Delete a row: data = data.drop([990]) Add a column: data['xx'] = range(len(data)) Delete a column:
data = data.drop('序号', axis=1)Indexing
Label‑based loc examples, including single label, list, and slice selections.
Position‑based iloc examples for integer, list/array, and slice selections.
Hierarchical Indexing
Series with multi‑level index:
s = pd.Series(np.arange(1,10), index=[list('aaabbccdd'), [1,2,3,1,2,3,1,2,3]])DataFrame with multi‑level index:
df = pd.DataFrame(np.arange(12).reshape(4,3), index=[["a","a","b","b"],[1,2,1,2]], columns=[["X","X","Y"],["m","n","t"]])Data Pre‑processing
Missing value handling:
df['pop'].isnull()
df['pop'].fillna(0, inplace=True)Drop rows/columns with missing values:
data.dropna(how='all')
data.dropna(axis=1)String cleaning:
df['A'] = df['A'].str.strip()
df['A'] = df['A'].str.lower()Duplicate removal:
df['A'] = df['A'].drop_duplicates()
df['A'] = df['A'].drop_duplicates(keep='last')Value replacement:
df['A'].replace('sh', 'shanghai')Grouping and Aggregation
Group by a column and perform aggregate calculations:
group = data.groupby(data['name'])Concatenation
pd.concat(objs, axis=0, join="outer", ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)Merge
pd.merge(left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None)Time Series
Generate a period range:
date = pd.period_range(start='20210913', end='20210919')Create a DataFrame indexed by periods:
index = pd.period_range(start='20210913', end='20210918')
df = pd.DataFrame(np.arange(24).reshape((6,4)), index=index)Conclusion
The tutorial demonstrates common pandas operations using the sample dataset, and directs readers to the official pandas documentation for deeper exploration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
