25 Essential Pandas Tricks Every Data Scientist Should Know
This comprehensive tutorial by data‑science instructor Kevin Markham presents 25 practical pandas techniques—including data loading, cleaning, transformation, aggregation, visualization, and performance optimization—demonstrated with real‑world datasets such as drinks, movies, Titanic, Chipotle orders, UFO sightings, and stock prices.
Introduction
Kevin Markham, a data‑science instructor, shares his favorite 25 pandas tricks covering data loading, cleaning, transformation, aggregation, and visualization.
Datasets
The tutorial uses several public datasets such as drinks, IMDB movies, Titanic, Chipotle orders, UFO sightings, and stock prices.
Key Tricks
Check pandas version with pd.__version__ and list supported libraries with pd.show_versions().
Create DataFrames from dictionaries, NumPy arrays, or CSV files.
Rename columns using rename(), direct assignment to columns, or str.replace.
Reverse row order with loc[::-1] and reset the index.
Reverse column order similarly with loc[:, ::-1].
Select columns by data type using select_dtypes() and exclude types with exclude.
Convert string columns to numeric with astype() or pd.to_numeric() and handle invalid entries with fillna().
Optimize memory by reading only needed columns ( usecols) and converting object columns to category dtype.
Combine multiple files into a single DataFrame using glob and pd.concat() (by rows or columns).
Create a DataFrame from the clipboard with pd.read_clipboard().
Split a DataFrame into random train/test subsets with sample() and drop().
Filter rows by multiple categories with isin() and invert selection with ~.
Select the top‑n categories using value_counts().nlargest().
Handle missing values using isna(), sum(), mean(), dropna(), and the threshold parameter.
Split a string column into multiple columns with str.split(expand=True).
Expand a Series of lists into a DataFrame using apply(pd.Series) and pd.concat().
Aggregate with multiple functions using agg() and broadcast results with transform().
Build pivot tables with pivot_table() and add margins.
Convert continuous variables to categorical bins with pd.cut().
Adjust display options such as float formatting with pd.set_option() and reset them with pd.reset_option().
Style DataFrames using style.format(), style.background_gradient(), and style.bar() for visual emphasis.
Generate an automatic data‑profile report with pandas_profiling.ProfileReport().
Each trick includes code examples and visual output to illustrate the concept.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
