Fundamentals 12 min read

Master Pandas: Essential Data Manipulation Techniques for Beginners

This comprehensive tutorial walks you through pandas basics, including reading CSV and Excel files, creating Series and DataFrames, performing data inspection, cleaning, indexing, hierarchical indexing, time‑series handling, grouping, aggregation, concatenation, merging, and practical code examples with visual outputs.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master Pandas: Essential Data Manipulation Techniques for Beginners

Pandas Introduction

The article presents a step‑by‑step guide to using pandas for data analysis, based on the sample file zlJob.csv. It covers data import, creation, basic operations, cleaning, and advanced features.

Data Import

Read CSV: pd.read_csv() Read Excel:

pd.read_excel()

Creating Data Structures

Create a Series: s = pd.Series([1, 2, 3, 4, 5]) Create a DataFrame:

df2 = pd.DataFrame({
    "A": 1.0,
    "B": pd.Timestamp("20130102"),
    "C": pd.Series(1, index=list(range(4)), dtype="float32"),
    "D": np.array([3] * 4, dtype="int32"),
    "E": pd.Categorical(["test", "train", "test", "train"]),
    "F": "foo"
})

Basic Data Operations

View first rows: data.head() Shape and dtypes: data.shape and data.dtypes Check for nulls:

data['name'].isnull()

Row and Column Manipulation

Add a row:

dic = {'name':'前端开发','salary':2.0,'company':'上海科技有限公司','adress':'上海','eduBack':'本科','companyType':'民营','scale':1000,'info':'小程序'}
df = pd.Series(dic)
df.name = 38738
data = data.append(df)

Delete a row: data = data.drop([990]) Add a column: data['xx'] = range(len(data)) Delete a column:

data = data.drop('序号', axis=1)

Indexing

Label‑based loc examples, including single label, list, and slice selections.

Position‑based iloc examples for integer, list/array, and slice selections.

Hierarchical Indexing

Series with multi‑level index:

s = pd.Series(np.arange(1,10), index=[list('aaabbccdd'), [1,2,3,1,2,3,1,2,3]])

DataFrame with multi‑level index:

df = pd.DataFrame(np.arange(12).reshape(4,3), index=[["a","a","b","b"],[1,2,1,2]], columns=[["X","X","Y"],["m","n","t"]])

Data Pre‑processing

Missing value handling:

df['pop'].isnull()
df['pop'].fillna(0, inplace=True)

Drop rows/columns with missing values:

data.dropna(how='all')
data.dropna(axis=1)

String cleaning:

df['A'] = df['A'].str.strip()
df['A'] = df['A'].str.lower()

Duplicate removal:

df['A'] = df['A'].drop_duplicates()
df['A'] = df['A'].drop_duplicates(keep='last')

Value replacement:

df['A'].replace('sh', 'shanghai')

Grouping and Aggregation

Group by a column and perform aggregate calculations:

group = data.groupby(data['name'])

Concatenation

pd.concat(objs, axis=0, join="outer", ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)

Merge

pd.merge(left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None)

Time Series

Generate a period range:

date = pd.period_range(start='20210913', end='20210919')

Create a DataFrame indexed by periods:

index = pd.period_range(start='20210913', end='20210918')
df = pd.DataFrame(np.arange(24).reshape((6,4)), index=index)

Conclusion

The tutorial demonstrates common pandas operations using the sample dataset, and directs readers to the official pandas documentation for deeper exploration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mergedata cleaningTime Seriesgroupby
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.