Fundamentals 5 min read

How to Accurately Count Monthly Retail Orders with Pandas: Three Efficient Methods

This article demonstrates how to deduplicate invoice numbers and compute monthly order counts in retail data using pandas, comparing three approaches—unique with explode, groupby with value_counts, and drop_duplicates—while highlighting performance differences and providing complete code examples.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Accurately Count Monthly Retail Orders with Pandas: Three Efficient Methods

Rescuing pandas plan (20) – Counting Monthly Retail Orders

Many users hesitate to adopt pandas, preferring other data‑manipulation libraries. To encourage pandas adoption, this tutorial shows how to calculate the number of orders per month for an online retail dataset.

Data Requirements

The source data comes from a 2020 book example where each row records a single product in an order, resulting in duplicate invoice numbers. The goal is to count orders per month.

import pandas as pd

df = pd.read_csv('Online_Retail.csv.zip', parse_dates=['InvoiceDate'])
df_new = df.dropna().copy()
# Extract YearMonth as 100*year + month
df_new['YearMonth'] = df_new['InvoiceDate'].map(lambda x: 100 * x.year + x.month)

Requirement Processing

Because only the invoice number matters, duplicate invoice numbers must be removed before counting.

Method 1: Use unique() then count.

df_new.groupby('InvoiceNo')['YearMonth'].unique().value_counts().sort_index()

This fails in recent pandas versions because unique() returns a list per row, which value_counts cannot handle.

df_new.groupby('InvoiceNo')['YearMonth'].unique().explode().value_counts().sort_index()

By exploding the lists, the method works correctly.

Method 2: Apply value_counts on the groupby result and then deduplicate.

df_new.groupby('InvoiceNo')['YearMonth'].value_counts().reset_index(name='count')['YearMonth'].value_counts().sort_index()

The first value_counts deduplicates YearMonth, and after resetting the index the second value_counts tallies the monthly order counts.

Method 3: Use drop_duplicates before counting.

df_new[['InvoiceNo', 'YearMonth']].drop_duplicates()['YearMonth'].value_counts().sort_index()

This approach yields shorter code and faster execution compared to the first two methods.

Summary

The article reproduces the book example, adapts it to current pandas versions, and presents three practical solutions for counting monthly orders, highlighting their differences in simplicity and performance. The original dataset can be obtained as described at the beginning of the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data analysisdeduplicationpandasgroupbymonthly orders
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.