How to Uncover Hidden Sales Insights with Python: A Complete Cosmetics Data Analysis
This article walks through a full Python‑based sales analysis for a cosmetics company, covering data loading, cleaning, aggregation, visualization, and business‑focused conclusions such as product demand cycles, regional performance, and customer retention strategies.
This article demonstrates a step‑by‑step Python workflow for analyzing the sales data of a cosmetics company, from raw Excel files to actionable business insights.
Business Requirement
Provide the company's leadership with a comprehensive view of overall sales operations, product performance, and recommendations for marketing and sales strategies.
Analysis Process
1. Scenario (Diagnose Current State)
Objects: users and sales. Focus: identify growth factors affecting sales and propose solutions.
2. Requirement Breakdown
Analyze overall sales trends to pinpoint months and products driving revenue.
Compare product sales by category (e.g., cosmetics vs. skincare) and region.
Examine user characteristics, purchase frequency, repeat purchase rate, and cohort analysis.
Implementation
Data Loading
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['font.family'] = 'SimHei'
import numpy as np
import warnings
warnings.filterwarnings('ignore')
data = pd.read_excel('C:/Users/cherich/Desktop/日化.xlsx', encoding='gbk')
data.head()Load the product information sheet:
data_info = pd.read_excel('C:/Users/cherich/Desktop/日化.xlsx', encoding='gbk', sheet_name='商品信息表')
data_infoData Cleaning and Processing
data = data.dropna()
# Remove trailing Chinese characters in quantity and price columns
data['订购数量'] = data['订购数量'].apply(lambda x: str(x)[:-1] if str(x)[-1] == '个' else x).astype(int)
data['订购单价'] = data['订购单价'].apply(lambda x: str(x)[:-1] if str(x)[-1] == '元' else x).astype(int)
# Standardize date format
def process_date(df):
pos = str(df).find('#')
if pos != -1:
parts = str(df).split('#')
return f"{parts[0]}-{parts[1]}-{parts[2]}"
return df
data['订单日期'] = data['订单日期'].apply(process_date)
data['订单日期'] = data['订单日期'].apply(lambda x: x.replace('年', '-').replace('月', '-') if '年' in str(x) else x)
data['订单日期'] = pd.to_datetime(data['订单日期'])
# Remove duplicates
data = data[data.duplicated() == False]
# Extract month for later grouping
data['月份'] = data['订单日期'].apply(lambda x: str(x).split('-')[1])Data Merging
total_data = pd.merge(data, data_info, on='商品编号', how='left')Visualization Examples
Monthly sales trend:
# Aggregate monthly sales and order counts
groups = data.groupby('月份')
months = [g[0] for g in groups]
amounts = [g[1].金额.sum() for g in groups]
counts = [g[1].金额.count() for g in groups]
money_mean = data.金额.sum() / 9
order_mean = data.金额.count() / 9
plt.figure(figsize=(18,10), dpi=80)
plt.subplot(221)
plt.plot(months, amounts, linewidth=2)
plt.axvspan('07', '08', color='#EE7621', alpha=0.3)
plt.axhline(money_mean, color='#EE7621', linestyle='--')
plt.title('每月销售额趋势图', fontsize=24, color='#4A708B')
plt.ylabel('金额/(亿)', fontsize=16)
plt.subplot(222)
plt.plot(months, counts, linewidth=2, color='#EE7621')
plt.axvline('07', color='#4A708B', linestyle='--')
plt.axhline(order_mean, color='#4A708B', linestyle='--')
plt.title('每月订单量趋势图', fontsize=24, color='#4A708B')
plt.ylabel('订单/(单)', fontsize=16)
plt.show()Product category comparison (彩妆 vs. 护肤品):
groups_category = total_data.groupby(['月份','商品大类'])
category1 = [] # 彩妆
category2 = [] # 护肤品
for (month, cat), df in groups_category:
if cat == '彩妆':
category1.append(df.金额.sum())
else:
category2.append(df.金额.sum())
labels = months
x = np.arange(len(labels))
width = 0.5
fig, ax = plt.subplots(figsize=(18,8))
ax.bar(x - width/2, category1, width, label='彩妆', color='#FFEC8B')
ax.bar(x + width/2, category2, width, label='护肤品', color='#4A708B')
ax.set_ylabel('销售额/(亿)')
ax.set_title('每月护肤品和彩妆的销售额对比图(大类)')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
plt.show()Regional analysis (East vs. West):
# Standardize region names
total_data['所在区域'] = total_data['所在区域'].apply(lambda x: str(x).replace('男区','南区').replace('西 区','西区'))
groups_area = total_data.groupby(['所在区域','商品小类'])
results = {}
for (region, subcat), df in groups_area:
money = int(df.金额.sum())
results.setdefault(region, {})[subcat] = moneyCustomer purchase frequency distribution:
data_user_buy = total_data.groupby('客户编码')['订单编码'].count()
plt.figure(figsize=(10,4), dpi=80)
plt.hist(data_user_buy, color='#FFEC8B')
plt.title('用户购买次数分布', fontsize=16)
plt.xlabel('购买次数')
plt.ylabel('用户数')
plt.show()Repeat purchase rate calculation:
date_rebuy = total_data.groupby('客户编码')['订单日期'].apply(lambda x: len(x.unique()))
rebuy_rate = round(date_rebuy[date_rebuy >= 2].count() / date_rebuy.count(), 4)
print('复购率:', rebuy_rate)Conclusions & Recommendations
Overall sales show an upward trend but dip sharply after August, likely due to seasonal slowdown.
Skincare products, especially masks, toners, and creams, drive the highest revenue; makeup (e.g., powder) has lower demand.
Product demand is seasonal: eye creams, toners, and masks peak in months 4‑8, while foundations and sunscreens peak in months 1‑3.
Eastern provinces (Jiangsu, Guangdong, Zhejiang) dominate sales; consider increasing marketing spend or establishing warehouses there.
Focus retention efforts on customers with 10‑35 purchases.
Retention rate exceeds 99%, indicating strong product loyalty.
New user acquisition is declining; explore channels such as influencer live‑streaming to attract fresh customers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
