Fundamentals 10 min read

Master Essential Pandas Functions with Practical Code Examples

This guide introduces Pandas, shows how to set up the environment, and provides detailed explanations and runnable code for common DataFrame operations such as reading CSV files, inspecting data, selecting columns, descriptive statistics, filtering, grouping, merging, and handling missing values.

Test Development Learning Exchange

Sep 7, 2024

Master Essential Pandas Functions with Practical Code Examples

1. Introduction to Pandas

Pandas is an open‑source data‑processing and analysis library built on NumPy. It offers two primary data structures—DataFrame and Series—that make working with structured data simple and efficient.

2. Environment Preparation

Ensure Pandas is installed in your Python environment. Install it via the following command:

pip install pandas

3. Common Functions and Examples

Reading Data

import pandas as pd
data = pd.read_csv('data.csv')
print(data.head())

Output example:

A    B    C
0 1.0  2.0  3.0
1 4.0  5.0  6.0

Inspecting Basic Information

print(data.info())

Typical output shows index range, column count, non‑null counts, and data types.

Selecting Columns

print(data[['A', 'C']])

Output example:

A    C
0 1.0  3.0
1 4.0  6.0

Descriptive Statistics

print(data.describe())

Provides count, mean, std, min, quartiles, and max for numeric columns.

Filtering Data

print(data.query('A > 2'))

Output example:

A    B    C
1 4.0  5.0  6.0

Adding New Columns

new_data = data.assign(D=data['A'] + data['C'])
print(new_data)

Output example:

A    B    C    D
0 1.0  2.0  3.0  4.0
1 4.0  5.0  6.0 10.0

Sorting

sorted_data = data.sort_values(by='B', ascending=False)
print(sorted_data)

Output example:

A    B    C
1 4.0  5.0  6.0
0 1.0  2.0  3.0

Group‑by Aggregation

grouped_data = data.groupby('A').agg({'B': ['mean', 'sum'], 'C': ['min', 'max']})
print(grouped_data)

Output example:

B          C
     mean sum  min max
A
1.0   2.0   2  3.0 3.0
4.0   5.0   5  6.0 6.0

Merging DataFrames

df2 = pd.DataFrame({'A': [1, 4], 'E': [7, 8]})
merged_data = pd.merge(data, df2, on='A')
print(merged_data)

Output example:

A    B    C    E
0 1.0  2.0  3.0  7.0
1 4.0  5.0  6.0  8.0

Handling Missing Values

clean_data = data.dropna()
print(clean_data)

In this example the dataset has no missing values, so the output matches the original.

4. Additional Frequently Used Functions

Renaming Columns

renamed_data = data.rename(columns={'A': 'Alpha', 'B': 'Beta'})
print(renamed_data)

Output example:

Alpha  Beta   C
0   1.0   2.0  3.0
1   4.0   5.0  6.0

Type Conversion

data['A'] = data['A'].astype(int)
print(data)

Duplicate Handling

duplicated_rows = data.duplicated()
print(duplicated_rows)
cleaned_data = data.drop_duplicates()
print(cleaned_data)

Assuming no duplicates, the boolean mask is [False, False] and the data remains unchanged.

Conditional Replacement

data.replace({1: 100, 4: 400}, inplace=True)
print(data)

Output example:

A    B    C
0 100  2.0  3.0
1 400  5.0  6.0

Conditional Selection

print(data.loc[data['A'] == 100])
print(data.iloc)

Concatenation

other_data = pd.DataFrame({'A': [100, 400], 'D': [7, 8]})
concatenated_data = pd.concat([data, other_data], axis=1)
print(concatenated_data)

Slice Operations

print(data.head(1))
print(data.tail(1))

Aggregation Functions

print(data.sum())
print(data.mean())

Value Counts

print(data['A'].value_counts())

Pivot Table

pivot_data = pd.pivot_table(data, values='C', index=['A'], aggfunc=np.sum)
print(pivot_data)

Time‑Series Resampling

data['date'] = pd.date_range(start='1/1/2020', periods=len(data), freq='D')
resampled_data = data.set_index('date').resample('M').sum()
print(resampled_data)

Vectorized Computation with apply

def add(a, b):
    return a + b
result = data.apply(lambda row: add(row['A'], row['B']), axis=1)
print(result)

Multi‑Index

multi_index_data = data.set_index(['A', 'B'])
print(multi_index_data)

Exporting Data

data.to_csv('output.csv', index=False)

Conditional Filtering with where

filtered_data = data.where(data['A'] > 200).dropna()
print(filtered_data)

Standardization

normalized_data = data.applymap(lambda x: (x - data[x.name].mean()) / data[x.name].std())
print(normalized_data)

Splitting Rows

def split_row(row):
    return 'low' if row['A'] < 300 else 'high'
split_data = data.apply(split_row, axis=1)
print(split_data.value_counts())

Data Type Inspection

print(data.dtypes)

Index Sorting

sorted_index_data = data.sort_index()
print(sorted_index_data)

Filling Missing Values

filled_data = data.fillna(0)
print(filled_data)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data analysis tutorial dataframe Pandas

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.