Fundamentals 12 min read

Master Pandas: From CSV to Advanced Data Manipulation in Python

This comprehensive tutorial walks you through pandas fundamentals—including reading CSV/Excel files, creating Series and DataFrames, performing basic operations, cleaning data, indexing, grouping, concatenation, merging, and handling time series—using clear examples and code snippets.

Python Crawling & Data Mining

Apr 27, 2022

Master Pandas: From CSV to Advanced Data Manipulation in Python

Pandas Introduction

This article provides a detailed overview of basic pandas operations using the sample file zlJob.csv.

Pandas Basics

1.1 Reading Data

Read CSV and Excel files:

pd.read_csv()

pd.read_excel()

1.2 Creating Data

Create Series and DataFrames:

s = pd.Series([1, 2, 3, 4, 5])

df2 = pd.DataFrame({
    "A": 1.0,
    "B": pd.Timestamp("20130102"),
    "C": pd.Series(1, index=list(range(4)), dtype="float32"),
    "D": np.array([3] * 4, dtype="int32"),
    "E": pd.Categorical(["test", "train", "test", "train"]),
    "F": "foo"
})

Basic DataFrame Operations

2.1 Viewing Data

Show first rows, shape, dtypes, and check for null values:

data.head()  # default first 5 rows

data.shape

data.dtypes

data['name'].isnull()

2.2 Row and Column Operations

Add and delete rows/columns:

dic = {
    'name': '前端开发',
    'salary': '2万-2.5万',
    'company': '上海科技有限公司',
    'adress': '上海',
    'eduBack': '本科',
    'companyType': '民营',
    'scale': '1000-10000人',
    'info': '小程序'
}
df = pd.Series(dic)
df.name = 38738
data = data.append(df)
data.tail()

data = data.drop([990])

data["xx"] = range(len(data))

data = data.drop('序号', axis=1)

Note: axis=1 deletes a column.

2.3 Indexing

Label‑based loc and position‑based iloc examples:

data.loc[10, 'salary']

data.loc[:, 'name'][:5]

data.loc[:, ['name', 'salary']][:5]

data.iloc[2]  # third row

data.iloc[:5]

data.iloc[:5, :4]  # first 5 rows, first 4 columns

2.4 Hierarchical Indexing

Series and DataFrame multi‑index examples:

s = pd.Series(np.arange(1, 10), index=[list('aaabbccdd'), [1,2,3,1,2,3,1,2,3]])

df = pd.DataFrame(np.arange(12).reshape(4,3),
    index=[["a","a","b","b"],[1,2,1,2]],
    columns=[["X","X","Y"],["m","n","t"]])

Data Preprocessing

3.1 Handling Missing Values

df = pd.DataFrame({
    'state': ['a','b','c','d'],
    'year': [1991,1992,1993,1994],
    'pop': [6.0,7.0,8.0, np.NaN]
})
df['pop'].isnull()
df['pop'].fillna(0, inplace=True)

data.dropna(how='all')

data.dropna(axis=1)

data.dropna(axis=0, subset=['Age','Sex'])

3.2 String Processing

df['A'] = df['A'].str.strip()
df['A'] = df['A'].str.lower()

3.3 Duplicate Handling

df['A'] = df['A'].drop_duplicates()
df['A'] = df['A'].drop_duplicates(keep='last')
df['A'].replace('sh', 'shanghai')

Grouping and Aggregation

group = data.groupby(data['name'])  # group by job title

The resulting group object can be used for aggregations such as mean or sum.

pd.concat([df1, df2, df3])

pd.merge(left, right, on='key')

Time Series

5.1 Generating a Period Range

date = pd.period_range(start='20210913', end='20210919')

5.2 Time Series DataFrame

index = pd.period_range(start='20210913', end='20210918')
df = pd.DataFrame(np.arange(24).reshape((6,4)), index=index)

Conclusion

This article demonstrates common pandas data‑processing operations based on the zlJob.csv dataset; for more detailed explanations refer to the official pandas documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data analysis data cleaning Pandas data manipulation

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.