Fundamentals 29 min read

Pandas Data Objects: Series, DataFrame Creation, Indexing, CRUD Operations, and Common Functions

This tutorial introduces pandas' two core data objects—Series and DataFrame—demonstrates how to create, index, query, modify, add, delete, sort, merge, and copy them, and shows common parameters, functions, I/O operations, plotting, and a practical log‑analysis example using Python.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Pandas Data Objects: Series, DataFrame Creation, Indexing, CRUD Operations, and Common Functions

This article provides a comprehensive guide to pandas, a Python library for data manipulation, focusing on its two primary data structures: Series and DataFrame .

Series is a one‑dimensional labeled array. A simple creation example is:

# 通过传入一个序列给pd.Series初始化一个Series对象, 比如list
s1 = pd.Series(list("1234"))
print(s1)
0    1
1    2
2    3
3    4
dtype: object

DataFrame is a two‑dimensional table‑like structure. It can be created from a NumPy array or a dictionary:

# 通过传入一个numpy的二维数组或者dict对象给pd.DataFrame初始化一个DataFrame对象
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,4))
print(df1)
# 通过dict字典
df2 = pd.DataFrame({
    'A' : 1.,
    'B' : pd.Timestamp('20130102'),
    'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
    'D' : np.array([3] * 4, dtype='int32'),
    'E' : pd.Categorical(["test","train","test","train"]),
    'F' : 'foo'
})
print(df2)

Both objects have an index; for Series the index labels each element, while for DataFrame it labels rows. The index can be inspected with print(s1.index) or print(df.index) .

Selection and Query – you can view columns, rows, or subsets using .head() , .tail() , slicing, .loc (label‑based) and .iloc (position‑based). Examples:

# 查看前5行
df.head()
# 查看第10行的date和code列
df.loc[10, ["date", "code"]]
# 按位置查看第1行
df.iloc[0]

Conditional filtering uses boolean masks:

# 查看open列大于10的前5行
df[df.open > 10].head()
# 同时满足两个条件
df[(df.open > 10) & (df.open < 10.6)].head()

Modification – change index, rename columns, or apply functions:

# 将df的索引修改为date列的数据并转换为datetime
df.index = pd.to_datetime(df.date)
# 重命名列
df.columns = ["Date","Open","Close","High","Low","Volume","Code"]
# 给Open列每个数值加1
df.Open = df.Open.apply(lambda x: x+1)

Deletion – use .drop() with inplace=True if you want to modify the original object:

# 删除Open列
df.drop("Open", axis=1, inplace=True)

Common parameters and functions – control display format, obtain descriptive statistics, and handle missing values:

# 设置浮点数显示格式
pd.options.display.float_format = '{:.4f}'.format
# 统计描述
df.describe()
# 删除含NaN的行
df.dropna(how='any')
# 用固定值填充NaN
df.fillna(value=5)

Sorting and Merging – sort by index or values, concatenate rows/columns, or merge datasets:

# 按列排序
df.sort_index(axis=1).head()
# 按Open列升序排序
df.sort_values(by="Open").head()
# 行方向合并
pd.concat([df.iloc[0:2,:], df.iloc[2:4,:], df.iloc[4:9]])

Copying – because DataFrames are mutable references, use .copy() to create an independent copy before making changes.

Plotting – pandas integrates with matplotlib. Simple line and area plots can be generated with:

df[["Open","Low","High","Close"]].plot()
df[["Open","Low","High","Close"]].plot(kind="area")

Data I/O – read and write common file formats such as CSV, Excel, JSON, or even the system clipboard:

# 保存为CSV
df.to_csv("stock.csv")
# 从CSV读取并指定第一列为索引
df2 = pd.read_csv("stock.csv", index_col=0)

Practical Example: Web Log Analysis – the article demonstrates parsing an Apache access log into a DataFrame, converting the timestamp, casting numeric fields, and performing analyses such as status‑code counts, top IP addresses, and time‑series visualisations.

# 正则表达式解析日志
REGEX = HOST+SPACE+IDENTITY+SPACE+USER+SPACE+TIME+SPACE+REQUEST+SPACE+STATUS+SPACE+SIZE+SPACE+IDENTITY+USER_AGENT+SPACE
log_df = pd.DataFrame(field_lis, columns=["Host","Time","Method","Path","Protocol","status","size","User_Agent"])
# 将Time列转为datetime并设为索引
log_df.Time = log_df.Time.apply(lambda x: x.replace(":", " ", 1))
log_df.Time = pd.to_datetime(log_df.Time)
log_df.set_index('Time', inplace=True)
# 统计不同状态码出现次数
log_df.Status.value_counts()
# 绘制状态码饼图
log_df.Status.value_counts().plot(kind="pie", figsize=(10,8))

Overall, the guide equips readers with the essential pandas operations needed for data cleaning, transformation, analysis, and visualization, making it a solid reference for anyone working with tabular data in Python.

Pythondata analysisCRUDdataframepandasseries
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.