How to Accurately Count Monthly Retail Orders with Pandas: Three Efficient Methods
This article demonstrates how to deduplicate invoice numbers and compute monthly order counts in retail data using pandas, comparing three approaches—unique with explode, groupby with value_counts, and drop_duplicates—while highlighting performance differences and providing complete code examples.
Rescuing pandas plan (20) – Counting Monthly Retail Orders
Many users hesitate to adopt pandas, preferring other data‑manipulation libraries. To encourage pandas adoption, this tutorial shows how to calculate the number of orders per month for an online retail dataset.
Data Requirements
The source data comes from a 2020 book example where each row records a single product in an order, resulting in duplicate invoice numbers. The goal is to count orders per month.
import pandas as pd
df = pd.read_csv('Online_Retail.csv.zip', parse_dates=['InvoiceDate'])
df_new = df.dropna().copy()
# Extract YearMonth as 100*year + month
df_new['YearMonth'] = df_new['InvoiceDate'].map(lambda x: 100 * x.year + x.month)Requirement Processing
Because only the invoice number matters, duplicate invoice numbers must be removed before counting.
Method 1: Use unique() then count.
df_new.groupby('InvoiceNo')['YearMonth'].unique().value_counts().sort_index()This fails in recent pandas versions because unique() returns a list per row, which value_counts cannot handle.
df_new.groupby('InvoiceNo')['YearMonth'].unique().explode().value_counts().sort_index()By exploding the lists, the method works correctly.
Method 2: Apply value_counts on the groupby result and then deduplicate.
df_new.groupby('InvoiceNo')['YearMonth'].value_counts().reset_index(name='count')['YearMonth'].value_counts().sort_index()The first value_counts deduplicates YearMonth, and after resetting the index the second value_counts tallies the monthly order counts.
Method 3: Use drop_duplicates before counting.
df_new[['InvoiceNo', 'YearMonth']].drop_duplicates()['YearMonth'].value_counts().sort_index()This approach yields shorter code and faster execution compared to the first two methods.
Summary
The article reproduces the book example, adapts it to current pandas versions, and presents three practical solutions for counting monthly orders, highlighting their differences in simplicity and performance. The original dataset can be obtained as described at the beginning of the article.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
