Fundamentals 11 min read

Top 25 Pandas Tricks for DataFrame Manipulation and Analysis

This tutorial showcases a comprehensive set of pandas techniques—including reading data from the clipboard, random sampling, multi‑condition filtering, handling missing values, string splitting, list expansion, multi‑function aggregation, slicing, descriptive statistics, categorical conversion, DataFrame styling, and profiling—to efficiently explore and transform DataFrames in Python.

Python Programming Learning Circle

Feb 12, 2025

Top 25 Pandas Tricks for DataFrame Manipulation and Analysis

The article presents a collection of practical pandas tricks for working with DataFrames, covering data import, sampling, filtering, missing‑value handling, aggregation, reshaping, styling, and profiling.

Data can be quickly loaded from the clipboard using df.read_clipboard(), though this method is not recommended for reproducible pipelines.

To split a DataFrame into random subsets, use df.sample(frac=0.75) for a 75% sample and df.drop(sample.index) for the remaining rows.

Multiple categorical filters can be expressed with boolean logic or more cleanly with df[df['genre'].isin(['Action', 'Drama', 'Western'])]. The inverse filter uses the tilde operator: df[~df['genre'].isin([...])].

Identify the most frequent categories by applying df['genre'].value_counts().nlargest(3) and filter with df[df['genre'].isin(top_counts.index)].

Missing values are detected with df.isna().sum(); rows or columns can be removed with df.dropna() or by setting a threshold, e.g., df.dropna(thresh=int(len(df)*0.9)).

String columns can be split into multiple columns using df['name'].str.split(' ', expand=True), optionally assigning the result back to the original DataFrame.

List‑like Series can be expanded into a DataFrame via df['list_col'].apply(pd.Series) and concatenated with the original using pd.concat([df, new_df], axis=1).

Aggregate multiple functions with df.groupby('order_id')['item_price'].agg(['sum', 'count']). To retain the original shape after aggregation, use df.groupby('order_id')['item_price'].transform('sum') and assign the result to a new column.

Row and column slicing is performed with df.loc[row_slice, col_slice], and descriptive statistics can be limited to a five‑number summary using df.describe().loc[['min','25%','50%','75%','max']].

Continuous variables can be binned into categories with

pd.cut(df['Age'], bins=[0,18,25,99], labels=['child','young adult','adult'])

, producing an ordered categorical Series.

DataFrames can be styled for better readability: df.style.format({'Close': '${:,.2f}', 'Volume': '{:,}'}), highlight minima/maxima, apply background gradients, or render bar charts with df.style.bar().

For rapid exploratory analysis, the pandas_profiling.ProfileReport(df) function generates an interactive HTML report summarizing overview, variable statistics, correlations, missing‑value analysis, and sample rows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Profiling visualization dataframe Pandas data-analysis data-manipulation

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.