Fundamentals 5 min read

Implementing RFM Customer Value Analysis with Python and Pandas

This article demonstrates how to use Python's time, NumPy, and Pandas libraries to build an RFM (Recency, Frequency, Monetary) model for customer value segmentation, covering data preparation, score calculation, weighting strategies, and exporting the results for marketing insights.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Implementing RFM Customer Value Analysis with Python and Pandas

Python’s extensive libraries make it a popular choice for data mining and analysis; this tutorial walks through building an RFM (Recency, Frequency, Monetary) customer value model using Python.

RFM combines three core commercial metrics—recency, purchase frequency, and monetary value—to assign a weighted score that can be used for precise marketing, acquisition, and retention strategies.

The technical stack includes Python’s time , NumPy , and Pandas packages, with Pandas providing data manipulation and basic visualization capabilities.

Given a raw dataset containing user IDs, order dates, order IDs, and order amounts, the script performs the following steps:

Import the Excel file into a DataFrame.

Remove rows with missing values and filter out orders with amounts ≤ 1.

Aggregate data by user ID to compute the most recent order date (recency), order count (frequency), and total spend (monetary).

Define a reference date (2020‑05‑01) and calculate the days since the last purchase for each user.

Score each dimension into five quantiles using pd.cut , with recency scores reversed (higher score for more recent purchases).

Combine the three scores either by a weighted sum (0.2 R + 0.2 F + 0.6 M) or by concatenating the categorical scores into an RFM code.

Export the final DataFrame to simple.csv .

<code>import time
import numpy as np
import pandas as pd

# 导入数据
df_raw = pd.DataFrame(pd.read_excel('test.xlsx', index_col='USERID'))

# 缺失值处理
sales_data = df_raw.dropna()
sales_data = sales_data[sales_data['AMOUNTINFO'] > 1]

# 数据转换 (按用户id去重归总)
recency_value = sales_data['ORDERDATE'].groupby(sales_data.index).max()
frequency_value = sales_data['ORDERDATE'].groupby(sales_data.index).count()
monetary_value = sales_data['AMOUNTINFO'].groupby(sales_data.index).sum()

# 分别计算R,F,M得分
deadline_date = pd.datetime(2020, 5, 1)
r_interval = (deadline_date - recency_value).dt.days
r_score = pd.cut(r_interval, 5, labels=[5,4,3,2,1])
f_score = pd.cut(frequency_value, 5, labels=[1,2,3,4,5])
m_score = pd.cut(monetary_value, 5, labels=[1,2,3,4,5])

# R,F,M数据合并
rfm_list = [r_score, f_score, m_score]
rfm_cols = ['r_score', 'f_score', 'm_score']
rfm_pd = pd.DataFrame(np.array(rfm_list).transpose(), dtype=np.int32, columns=rfm_cols, index=frequency_value.index)

# 策略1:加权得分 定义用户价值
rfm_pd['rfm_wscore'] = rfm_pd['r_score']*0.2 + rfm_pd['f_score']*0.2 + rfm_pd['m_score']*0.6

# 策略2:RFM组合 直接输出三维度值
rfm_pd_tmp = rfm_pd.copy()
rfm_pd_tmp['r_score'] = rfm_pd_tmp['r_score'].astype('str')
rfm_pd_tmp['f_score'] = rfm_pd_tmp['f_score'].astype('str')
rfm_pd_tmp['m_score'] = rfm_pd_tmp['m_score'].astype('str')
rfm_pd['rfm_comb'] = rfm_pd_tmp['r_score'].str.cat(rfm_pd_tmp['f_score']).str.cat(rfm_pd_tmp['m_score'])

# 导出数据
rfm_pd.to_csv('simple.csv')
</code>

Running the script generates a CSV file containing each user’s R, F, M scores, the weighted RFM score, and the combined RFM code, illustrating a complete data‑mining workflow with Pandas.

data analysispandascustomer segmentationrfm
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.