Fundamentals 6 min read

Identify Top Merchants with Pandas: First‑Order Analysis Tutorial

This article walks through a real‑world Pandas solution that calculates each merchant's total orders and first‑order counts to reveal which merchants are most popular among new customers, using AI‑assisted code generation and step‑by‑step explanation.

Python Crawling & Data Mining

Mar 2, 2024

Identify Top Merchants with Pandas: First‑Order Analysis Tutorial

Hello, I am a Python enthusiast.

1. Introduction

A community member asked how to determine which merchants are most popular among new customers, requiring the total number of orders per merchant and the number of first orders (the first purchase made by each customer), while excluding merchants that have received no orders.

2. Implementation

I used the ChatGLM model to generate a Python solution. The code loads the Excel file, sorts the data by customer ID and order timestamp, extracts each customer's first order, aggregates total orders and first‑order counts per merchant, and merges the results with merchant names.

import pandas as pd

# Load the Excel file
df = pd.read_excel("./data.xlsx")

# Sort by customer_id and order_timestamp to get the first order for each customer
df_sorted = df.sort_values(by=['customer_id', 'order_timestamp'])

# Get the first order for each customer
df_first_orders = df_sorted.drop_duplicates(subset='customer_id', keep='first')

# Aggregate total orders and total customers per merchant
merchant_stats = df.groupby('merchant_id').agg(
    total_orders=pd.NamedAgg(column='id_x', aggfunc='count'),
    total_customers=pd.NamedAgg(column='customer_id', aggfunc='nunique')
).reset_index()

# Aggregate first orders per merchant
first_order_stats = df_first_orders.groupby('merchant_id').agg(
    first_orders=pd.NamedAgg(column='id_x', aggfunc='count')
).reset_index()

# Merge the two dataframes
result = pd.merge(merchant_stats, first_order_stats, on='merchant_id')

# Merge with merchant names
result = pd.merge(result, df[['id_y', 'name']].drop_duplicates(), left_on='merchant_id', right_on='id_y').drop('id_y', axis=1)

print(result.head())

The script runs successfully on local data, producing the expected merchant statistics.

3. Conclusion

This article demonstrates a practical Pandas workflow for answering a common business‑analytics question, illustrating how AI can assist in generating functional Python code to solve real‑world data problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data analysis AI-assisted coding Pandas

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.