Data Mining Techniques for Marketing: Customer Segmentation, Purchase Prediction, Recommendation, and More with Python
This article introduces ten data‑mining applications for marketing—including customer segmentation, purchase forecasting, market‑basket analysis, churn prediction, sentiment analysis, response modeling, recommendation systems, brand reputation, competitive analysis, and public‑opinion monitoring—each illustrated with concise Python code examples.
Data mining techniques are widely used in marketing to discover potential customers, optimize ad placement, and provide personalized recommendations.
1. Customer Segmentation groups customers based on behavior and attributes to enable targeted marketing.
from sklearn.cluster import KMeans
# 假设有客户数据,包括年龄和购买金额
data = [[25, 100], [30, 150], [20, 80], [40, 200], [35, 180]]
kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
labels = kmeans.labels_
print("客户细分结果:", labels)2. Purchase Prediction forecasts future buying behavior using historical purchase data, helping inventory management and personalized strategies.
from sklearn.linear_model import LinearRegression
# 假设有用户的购买历史数据,包括购买金额和购买时间
X = [[100, 1], [150, 2], [80, 3], [200, 4], [180, 5]]
y = [120, 160, 100, 220, 200]
regression = LinearRegression()
regression.fit(X, y)
# 假设要预测第6个月的购买金额
new_X = [[250, 6]]
prediction = regression.predict(new_X)
print("购买预测结果:", prediction)3. Market‑Basket Analysis discovers product associations (e.g., customers who bought A also bought B) to support cross‑selling and recommendation systems.
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
# 假设有交易数据,包括订单号和产品名称
data = pd.DataFrame({'OrderID': [1, 1, 2, 2, 2, 3],
'Product': ['A', 'B', 'A', 'C', 'D', 'B']})
basket = (data.groupby(['OrderID', 'Product'])['Product']
.count().unstack().reset_index().fillna(0)
.set_index('OrderID'))
frequent_itemsets = apriori(basket, min_support=0.5, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print("关联规则:")
print(rules)4. Churn Prediction analyzes user behavior and attributes to identify customers likely to leave, enabling proactive retention actions.
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 假设有用户的行为数据和标签,其中1表示流失,0表示未流失
X = [[1, 100], [0, 150], [0, 80], [1, 200], [0, 180]]
y = [1, 0, 0, 1, 0]
classifier = RandomForestClassifier()
classifier.fit(X, y)
# 假设要预测新用户的流失情况
new_X = [[1, 120]]
prediction = classifier.predict(new_X)
print("流失预测结果:", prediction)5. Sentiment Analysis evaluates user comments on social media to determine emotional polarity, helping gauge product or service satisfaction.
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
comments = ["这个产品太棒了!", "服务很差,不推荐购买。"]
for comment in comments:
result = nlp(comment)[0]
print("评论:", comment)
print("情感倾向:", result['label'])
print("置信度:", result['score'])6. Marketing Response Modeling builds predictive models to estimate the effect and ROI of marketing campaigns, guiding budget allocation.
import pandas as pd
from sklearn.linear_model import LogisticRegression
# 假设有市场营销活动的历史数据,包括广告费用和销售额
data = pd.DataFrame({'AdCost': [100, 150, 80, 200, 180],
'Sales': [1000, 1200, 800, 1500, 1400]})
X = data[['AdCost']]
y = data['Sales']
model = LogisticRegression()
model.fit(X, y)
# 假设要预测广告费用为120的销售额
new_X = [[120]]
prediction = model.predict(new_X)
print("销售额预测结果:", prediction)7. Recommendation System suggests relevant products or content based on users' historical interactions and preferences.
import pandas as pd
from surprise import Dataset, KNNBasic, Reader
# 假设有用户评分数据,包括用户ID、产品ID和评分
data = pd.DataFrame({'UserID': [1, 1, 2, 2, 3],
'ProductID': ['A', 'B', 'A', 'C', 'B'],
'Rating': [5, 4, 3, 2, 5]})
reader = Reader(rating_scale=(1, 5))
dataset = Dataset.load_from_df(data[['UserID', 'ProductID', 'Rating']], reader)
trainset = dataset.build_full_trainset()
sim_options = {'name': 'cosine', 'user_based': False}
algorithm = KNNBasic(sim_options=sim_options)
algorithm.fit(trainset)
user_id = 1
n_recommendations = 3
user_items = trainset.ur[user_id]
item_ids = [item_id for item_id, _ in user_items]
predictions = algorithm.test([(user_id, item_id, 0) for item_id in trainset.all_items() if item_id not in item_ids])
top_n = sorted(predictions, key=lambda x: x.est, reverse=True)[:n_recommendations]
print("用户", user_id, "的推荐产品:")
for prediction in top_n:
print(prediction.iid, "预测评分:", prediction.est)8. Brand Reputation Analysis examines social‑media comments and mentions to assess public perception of a brand.
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
comments = ["这个品牌的产品质量很好!", "这个品牌的售后服务太差了。"]
for comment in comments:
result = nlp(comment)[0]
print("评论:", comment)
print("情感倾向:", result['label'])
print("置信度:", result['score'])9. Competitive Analysis studies competitors' market share, pricing, and product features to identify strengths and weaknesses.
import pandas as pd
from matplotlib import pyplot as plt
# 假设有竞争对手的市场份额和产品特点数据
data = pd.DataFrame({'Competitor': ['A', 'B', 'C', 'D'],
'MarketShare': [0.3, 0.2, 0.4, 0.1],
'Price': [100, 120, 80, 150],
'Feature': ['High', 'Low', 'Medium', 'Low']})
plt.bar(data['Competitor'], data['MarketShare'])
plt.xlabel('Competitor')
plt.ylabel('Market Share')
plt.title('Competitor Market Share')
plt.show()
print("竞争对手的产品特点:")
print(data[['Competitor', 'Price', 'Feature']])10. Public‑Opinion Monitoring tracks social media and news to detect emerging sentiment about a company or product, enabling timely response to crises.
import tweepy
# 假设已经设置好Twitter API的认证信息
consumer_key = 'YOUR_CONSUMER_KEY'
consumer_secret = 'YOUR_CONSUMER_SECRET'
access_token = 'YOUR_ACCESS_TOKEN'
access_token_secret = 'YOUR_ACCESS_TOKEN_SECRET'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
keyword = "品牌名称"
tweets = api.search(q=keyword, count=10)
print("最新的相关推文:")
for tweet in tweets:
print(tweet.text)These Python snippets provide a foundational toolbox for extracting actionable insights from textual and transactional data across various marketing scenarios; real‑world deployments will require data cleaning, parameter tuning, and integration with existing systems.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.