Fundamentals 7 min read

15 Essential Pandas Filtering Tricks Every Data Analyst Should Know

This article presents a step‑by‑step tutorial on fifteen practical pandas filtering techniques, demonstrating how to select, exclude, and query supermarket operation data using comparison operators, logical expressions, and pandas-specific functions, all illustrated with clear code examples and screenshots.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
15 Essential Pandas Filtering Tricks Every Data Analyst Should Know

pandas is a essential Python data analysis tool with powerful data cleaning capabilities, often achieving complex processing with minimal code.

The author summarizes 15 common pandas filtering techniques, organized into five key concepts:

Comparison operators

Range operations

String filtering

Logical operations

Comparison functions and apply/isin

All examples use a supermarket operation dataset.

1. Filter rows where store ID equals 'CDXL'

Method 1: Using the == operator

data[data.门店编号=='CDXL']

Method 2: Using the eq function

data[data['门店编号'].eq('CDXL')]

2. Filter rows with price ≤ 10

Method 1: Using the <= operator

data[data.单价<=10]

Method 2: Using the le function

data[data['单价'].le(10)]

3. Filter rows with sales > 2000

Method 1: Using the > operator

data[data.销量>2]

Method 2: Using the ge function

data[data['销量'].ge(2)]

4. Exclude store 'CDXL'

Method 1: Using the != operator

data[data.门店编号!='CDXL']

Method 2: Using the ne function

data[data['门店编号'].ne('CDXL')]

5. Filter data for May 2020

First ensure the date column is of datetime type, then define start and end dates.

data['日期']=data['日期'].values.astype('datetime64')
import datetime
s_date=datetime.datetime.strptime('2020-04-30','%Y-%m-%d').date()
e_date=datetime.datetime.strptime('2020-06-01','%Y-%m-%d').date()

Method 1: Using logical operators

data[(data.日期>pd.Timestamp(s_date)) & (data.日期<pd.Timestamp(e_date))]

Method 2: Using gt/lt functions

data[(data['日期'].lt(pd.Timestamp(e_date))) & (data['日期'].gt(pd.Timestamp(s_date)))]

Method 3: Using apply with a lambda

id_a=data.日期.apply(lambda x: x.year==2020 and x.month==5)
data[id_a]

Method 4: Using between

id_b=data.日期.between(pd.Timestamp(s_date), pd.Timestamp(e_date))
data[id_b]

6. Filter rows where category ID contains '000'

data['类别ID']=data['类别ID'].astype(str)
id_c=data.类别ID.str.contains('000',na=False)
data[id_c]

7. Filter rows where product ID starts with '301'

data['商品ID']=data['商品ID'].astype(str)
id_c2=data.商品ID.str.contains('301\d{5}',na=False)
data[id_c2]
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonTutorialData Filtering
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.