15 Essential Pandas Filtering Tricks Every Data Analyst Should Know
This article presents a step‑by‑step tutorial on fifteen practical pandas filtering techniques, demonstrating how to select, exclude, and query supermarket operation data using comparison operators, logical expressions, and pandas-specific functions, all illustrated with clear code examples and screenshots.
pandas is a essential Python data analysis tool with powerful data cleaning capabilities, often achieving complex processing with minimal code.
The author summarizes 15 common pandas filtering techniques, organized into five key concepts:
Comparison operators
Range operations
String filtering
Logical operations
Comparison functions and apply/isin
All examples use a supermarket operation dataset.
1. Filter rows where store ID equals 'CDXL'
Method 1: Using the == operator
data[data.门店编号=='CDXL']Method 2: Using the eq function
data[data['门店编号'].eq('CDXL')]2. Filter rows with price ≤ 10
Method 1: Using the <= operator
data[data.单价<=10]Method 2: Using the le function
data[data['单价'].le(10)]3. Filter rows with sales > 2000
Method 1: Using the > operator
data[data.销量>2]Method 2: Using the ge function
data[data['销量'].ge(2)]4. Exclude store 'CDXL'
Method 1: Using the != operator
data[data.门店编号!='CDXL']Method 2: Using the ne function
data[data['门店编号'].ne('CDXL')]5. Filter data for May 2020
First ensure the date column is of datetime type, then define start and end dates.
data['日期']=data['日期'].values.astype('datetime64')
import datetime
s_date=datetime.datetime.strptime('2020-04-30','%Y-%m-%d').date()
e_date=datetime.datetime.strptime('2020-06-01','%Y-%m-%d').date()Method 1: Using logical operators
data[(data.日期>pd.Timestamp(s_date)) & (data.日期<pd.Timestamp(e_date))]Method 2: Using gt/lt functions
data[(data['日期'].lt(pd.Timestamp(e_date))) & (data['日期'].gt(pd.Timestamp(s_date)))]Method 3: Using apply with a lambda
id_a=data.日期.apply(lambda x: x.year==2020 and x.month==5)
data[id_a]Method 4: Using between
id_b=data.日期.between(pd.Timestamp(s_date), pd.Timestamp(e_date))
data[id_b]6. Filter rows where category ID contains '000'
data['类别ID']=data['类别ID'].astype(str)
id_c=data.类别ID.str.contains('000',na=False)
data[id_c]7. Filter rows where product ID starts with '301'
data['商品ID']=data['商品ID'].astype(str)
id_c2=data.商品ID.str.contains('301\d{5}',na=False)
data[id_c2]Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
