Game Industry User Data Analysis: Registration Distribution, Payment Metrics, and Consumption Patterns
This article presents a comprehensive Python-based analysis of a large game dataset (2.29 million records, 109 fields), covering user registration trends, payment rates, ARPU/ARPPU calculations, level‑based spending behavior, and consumption patterns of resources and acceleration items, with visualizations and actionable conclusions.
Field Description
The dataset contains 2,290,000 records and 109 fields; key fields include user_id, bd_stronghold_level (account level), resource consumption values (wood, stone, ivory, meat, magic), acceleration consumption values, battle counts, average online minutes, pay_price, and pay_count.
Analysis Idea
User registration time distribution
Payment metrics (payment rate, ARPU, ARPPU)
Payment behavior by level
Consumption habits of different player groups
Analysis Process
1. Import data
import numpy as np import pandas as pd from pandas import read_csv from sklearn.cluster import KMeans import matplotlib.pyplot as plt import pylab as pl from matplotlib.font_manager import FontManager, FontProperties pd.set_option('display.max_columns', None) # Copy data for safety df = df0 # Check for null values print(df.isnull().any().any()) # Preview data print(df.head())2. Clean data
# Remove duplicate user_id entries df = df.drop_duplicates(subset='user_id') print('用户总数:', len(df['user_id']))3. Compute registration distribution
# Truncate registration_time to day register_date = [] for i in df['register_time']:
date = i[5:10]
register_date.append(date)
df['register_time'] = register_date # Count registrations per day df_register = df.groupby('register_time').size() df_register.columns = ['日期', '注册人数'] print(df_register) # Plot registration trend plt.plot(df_register) plt.grid(True) pl.xticks(rotation=90) font = FontProperties(fname='/System/Library/Fonts/PingFang.ttc') plt.title('用户注册分布图', fontproperties=font) plt.show()4. Payment analysis
# Payment rate (paying users / active users) df_pay_user = df[df['pay_price'] > 0] pay_rate = df_pay_user['user_id'].count() / df_active_user['user_id'].count() print('付费率:%.2f' % (pay_rate)) # ARPU (total payment / active users) arpu = df_pay_user['pay_price'].sum() / df_active_user['user_id'].count() print('ARPU:%.2f' % (arpu)) # ARPPU (total payment / paying users) arppu = df_pay_user['pay_price'].sum() / df_pay_user['user_id'].count() print('ARPPU:%.2f' % (arppu))5. Payment behavior by level
df_user = df[['user_id', 'bd_stronghold_level', 'pay_price', 'pay_count']]
df_table = pd.pivot_table(df_user, index=['bd_stronghold_level'],
values=['user_id', 'pay_price', 'pay_count'],
aggfunc={'user_id':'count','pay_price':'sum','pay_count':'sum'})
df_stronghold_pay = pd.DataFrame(df_table.to_records())
# Calculate paying users per level
df_stronghold_pay['pay_num'] = df_user[(df_user['pay_price']>0)].groupby('bd_stronghold_level').user_id.count()
# Conversion rate per level
df_stronghold_pay['pay_rate'] = df_stronghold_pay['pay_num'] / df_stronghold_pay['user_id']
# Average payment per level
df_stronghold_pay['avg_pay_price'] = df_stronghold_pay['pay_price'] / df_stronghold_pay['user_id']
# Average payment count per level
df_stronghold_pay['avg_pay_count'] = df_stronghold_pay['pay_count'] / df_stronghold_pay['user_id']
# Rename columns
df_stronghold_pay.columns = ['要塞等级','总人数','总付费金额','总付费次数','付费人数','付费转化率','人均付费金额','人均付费次数']
df_stronghold_pay = df_stronghold_pay[['要塞等级','总人数','付费人数','付费转化率','总付费金额','人均付费金额','总付费次数','人均付费次数']]
df_stronghold_pay = df_stronghold_pay.round(2)
print(df_stronghold_pay)6. Consumption habits of different player groups
# Define high‑value players (level >=10 and pay_price >=500) and normal players df_eli_user = df[(df['pay_price']>=500) & (df['bd_stronghold_level']>=10)]
df_nor_user = df[(df['pay_price']<500) & (df['bd_stronghold_level']>10)] # Average resource consumption for each group wood_avg = [df_eli_user['wood_reduce_value'].mean(), df_nor_user['wood_reduce_value'].mean()]
stone_avg = [df_eli_user['stone_reduce_value'].mean(), df_nor_user['stone_reduce_value'].mean()]
ivory_avg = [df_eli_user['ivory_reduce_value'].mean(), df_nor_user['ivory_reduce_value'].mean()]
meat_avg = [df_eli_user['meat_reduce_value'].mean(), df_nor_user['meat_reduce_value'].mean()]
magic_avg = [df_eli_user['magic_reduce_value'].mean(), df_nor_user['magic_reduce_value'].mean()]
props_data = {'high_value_player':[wood_avg[0], stone_avg[0], ivory_avg[0], meat_avg[0], magic_avg[0]],
'normal_player':[wood_avg[1], stone_avg[1], ivory_avg[1], meat_avg[1], magic_avg[1]]}
df_props = pd.DataFrame(props_data, index=['wood','stone','ivory','meat','magic']).round(2)
print(df_props)
<code># Plot resource consumption df_props.plot(kind='bar', title='Props Reduce', grid=True, legend=True)
plt.show() # Acceleration item consumption general_avg = [df_eli_user['general_acceleration_reduce_value'].mean(), df_nor_user['general_acceleration_reduce_value'].mean()]
building_avg = [df_eli_user['building_acceleration_reduce_value'].mean(), df_nor_user['building_acceleration_reduce_value'].mean()]
research_avg = [df_eli_user['reaserch_acceleration_reduce_value'].mean(), df_nor_user['reaserch_acceleration_reduce_value'].mean()]
training_avg = [df_eli_user['training_acceleration_reduce_value'].mean(), df_nor_user['training_acceleration_reduce_value'].mean()]
training_avg = [df_eli_user['training_acceleration_reduce_value'].mean(), df_nor_user['training_acceleration_reduce_value'].mean()]
treatment_avg = [df_eli_user['treatment_acceleration_reduce_value'].mean(), df_nor_user['treatment_acceleration_reduce_value'].mean()]
acceleration_data = {'high_value_player':[general_avg[0], building_avg[0], research_avg[0], training_avg[0], treatment_avg[0]],
'normal_player':[general_avg[1], building_avg[1], research_avg[1], training_avg[1], treatment_avg[1]]}
df_acceleration = pd.DataFrame(acceleration_data, index=['general','building','researching','training','treatment']).round(2)
print(df_acceleration)
<code># Plot acceleration consumption df_acceleration.plot(kind='bar', title='Acceleration Reduce', grid=True, legend=True)
plt.show()Conclusion
1. The game has a large user base; new registrations are strongly influenced by events and version updates.
2. ARPU of 8.55 indicates high profitability.
3. Users reaching level 10 show a sharp increase in payment propensity, approaching 100% at level 13, but most users stay below level 10, making level‑up strategies critical.
4. High‑value players consume significantly more ivory and general acceleration items than normal players.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.