8 Powerful Pandas Tricks to Master Data Selection
This article presents eight practical pandas data‑selection techniques—including boolean indexing, loc/iloc, isin, str.contains, where/mask, query, filter, and any/all—illustrated with code examples and visual outputs to help Python users efficiently extract and analyze data.
The author introduces a series of eight handy pandas data‑selection tricks, using the Boston housing dataset as a running example.
1. Boolean Indexing with []
Filter rows directly inside the DataFrame brackets. For example, select rows where the NOX column is greater than its mean and sort descending:
df[df['NOX'] > df['NOX'].mean()].sort_values(by='NOX', ascending=False).head()Combine conditions with logical operators & and |, using parentheses to separate them:
df[(df['NOX'] > df['NOX'].mean()) & (df['CHAS'] == 1)].sort_values(by='NOX', ascending=False).head()2. loc / iloc
locaccesses data by label (row/column names), while iloc uses integer positions. Both support single values or slices. Example of assigning a value using loc:
df.loc[(df['NOX'] > df['NOX'].mean()), ['CHAS']] = 23. isin
Use isin to filter rows whose column values belong to a specific list. Example selecting rows where NOX is one of three values:
df.loc[df['NOX'].isin([0.538, 0.713, 0.437]), :].sample(5)Negate the condition with ~ to get rows not matching the list:
df.loc[~df['NOX'].isin([0.538, 0.713, 0.437]), :].sample(5)4. str.contains
For string matching, pandas provides .str.contains(), similar to SQL LIKE. Example using the Titanic dataset to find names containing "Mrs" or "Lily":
train.loc[train['Name'].str.contains('Mrs|Lily'), :].head()Additional options include case, na, flags, and regex for fine‑grained control.
5. where / mask
wherekeeps values that satisfy a boolean condition and replaces others with NaN (or a specified other value). Example:
cond = train['Sex'] == 'male'
train['Sex'].where(cond, inplace=True)Using other to assign a custom value:
train['Sex'].where(cond, other='FEMALE', inplace=True) maskworks oppositely, replacing values where the condition is True.
train['quality'].mask(cond1 & cond2, other='Low‑quality male', inplace=True)6. query
The query method offers a readable string‑based filtering syntax. Simple example: train.query('Age > 25') Complex conditions can combine str.contains and variables using @:
name = 'William'
train.query("Name.str.contains(@name) & Age > 25")7. filter
filterselects subsets of rows or columns. It supports items (explicit labels), regex, like, and the axis argument.
train.filter(items=['Age', 'Sex'])
train.filter(regex='S', axis=1)
train.filter(like='2', axis=0)
train.filter(regex='^2', axis=0).filter(like='S', axis=1)8. any / all
anyreturns True if at least one element is True; all returns True only if every element is True. They are often combined with isnull() to inspect missing data:
train['Cabin'].any()
train['Cabin'].all()
train.isnull().any(axis=0)
train.isnull().any(axis=1).sum()These eight techniques cover most common pandas filtering scenarios, enabling concise and efficient data extraction for analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
