Master Essential Pandas Functions with Practical Code Examples
This guide introduces Pandas, shows how to set up the environment, and provides detailed explanations and runnable code for common DataFrame operations such as reading CSV files, inspecting data, selecting columns, descriptive statistics, filtering, grouping, merging, and handling missing values.
1. Introduction to Pandas
Pandas is an open‑source data‑processing and analysis library built on NumPy. It offers two primary data structures—DataFrame and Series—that make working with structured data simple and efficient.
2. Environment Preparation
Ensure Pandas is installed in your Python environment. Install it via the following command:
pip install pandas3. Common Functions and Examples
Reading Data
import pandas as pd
data = pd.read_csv('data.csv')
print(data.head())Output example:
A B C
0 1.0 2.0 3.0
1 4.0 5.0 6.0Inspecting Basic Information
print(data.info())Typical output shows index range, column count, non‑null counts, and data types.
Selecting Columns
print(data[['A', 'C']])Output example:
A C
0 1.0 3.0
1 4.0 6.0Descriptive Statistics
print(data.describe())Provides count, mean, std, min, quartiles, and max for numeric columns.
Filtering Data
print(data.query('A > 2'))Output example:
A B C
1 4.0 5.0 6.0Adding New Columns
new_data = data.assign(D=data['A'] + data['C'])
print(new_data)Output example:
A B C D
0 1.0 2.0 3.0 4.0
1 4.0 5.0 6.0 10.0Sorting
sorted_data = data.sort_values(by='B', ascending=False)
print(sorted_data)Output example:
A B C
1 4.0 5.0 6.0
0 1.0 2.0 3.0Group‑by Aggregation
grouped_data = data.groupby('A').agg({'B': ['mean', 'sum'], 'C': ['min', 'max']})
print(grouped_data)Output example:
B C
mean sum min max
A
1.0 2.0 2 3.0 3.0
4.0 5.0 5 6.0 6.0Merging DataFrames
df2 = pd.DataFrame({'A': [1, 4], 'E': [7, 8]})
merged_data = pd.merge(data, df2, on='A')
print(merged_data)Output example:
A B C E
0 1.0 2.0 3.0 7.0
1 4.0 5.0 6.0 8.0Handling Missing Values
clean_data = data.dropna()
print(clean_data)In this example the dataset has no missing values, so the output matches the original.
4. Additional Frequently Used Functions
Renaming Columns
renamed_data = data.rename(columns={'A': 'Alpha', 'B': 'Beta'})
print(renamed_data)Output example:
Alpha Beta C
0 1.0 2.0 3.0
1 4.0 5.0 6.0Type Conversion
data['A'] = data['A'].astype(int)
print(data)Duplicate Handling
duplicated_rows = data.duplicated()
print(duplicated_rows)
cleaned_data = data.drop_duplicates()
print(cleaned_data)Assuming no duplicates, the boolean mask is [False, False] and the data remains unchanged.
Conditional Replacement
data.replace({1: 100, 4: 400}, inplace=True)
print(data)Output example:
A B C
0 100 2.0 3.0
1 400 5.0 6.0Conditional Selection
print(data.loc[data['A'] == 100])
print(data.iloc)Concatenation
other_data = pd.DataFrame({'A': [100, 400], 'D': [7, 8]})
concatenated_data = pd.concat([data, other_data], axis=1)
print(concatenated_data)Slice Operations
print(data.head(1))
print(data.tail(1))Aggregation Functions
print(data.sum())
print(data.mean())Value Counts
print(data['A'].value_counts())Pivot Table
pivot_data = pd.pivot_table(data, values='C', index=['A'], aggfunc=np.sum)
print(pivot_data)Time‑Series Resampling
data['date'] = pd.date_range(start='1/1/2020', periods=len(data), freq='D')
resampled_data = data.set_index('date').resample('M').sum()
print(resampled_data)Vectorized Computation with apply
def add(a, b):
return a + b
result = data.apply(lambda row: add(row['A'], row['B']), axis=1)
print(result)Multi‑Index
multi_index_data = data.set_index(['A', 'B'])
print(multi_index_data)Exporting Data
data.to_csv('output.csv', index=False)Conditional Filtering with where
filtered_data = data.where(data['A'] > 200).dropna()
print(filtered_data)Standardization
normalized_data = data.applymap(lambda x: (x - data[x.name].mean()) / data[x.name].std())
print(normalized_data)Splitting Rows
def split_row(row):
return 'low' if row['A'] < 300 else 'high'
split_data = data.apply(split_row, axis=1)
print(split_data.value_counts())Data Type Inspection
print(data.dtypes)Index Sorting
sorted_index_data = data.sort_index()
print(sorted_index_data)Filling Missing Values
filled_data = data.fillna(0)
print(filled_data)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
