Big Data 11 min read

Game Industry User Data Analysis: Registration Distribution, Payment Metrics, and Consumption Patterns

This article presents a comprehensive Python-based analysis of a large game dataset (2.29 million records, 109 fields), covering user registration trends, payment rates, ARPU/ARPPU calculations, level‑based spending behavior, and consumption patterns of resources and acceleration items, with visualizations and actionable conclusions.

Python Programming Learning Circle

Aug 17, 2022

Game Industry User Data Analysis: Registration Distribution, Payment Metrics, and Consumption Patterns

Field Description

The dataset contains 2,290,000 records and 109 fields; key fields include user_id, bd_stronghold_level (account level), resource consumption values (wood, stone, ivory, meat, magic), acceleration consumption values, battle counts, average online minutes, pay_price, and pay_count.

Analysis Idea

User registration time distribution

Payment metrics (payment rate, ARPU, ARPPU)

Payment behavior by level

Consumption habits of different player groups

Analysis Process

1. Import data

import numpy as np

import pandas as pd

from pandas import read_csv

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

import pylab as pl

from matplotlib.font_manager import FontManager, FontProperties

pd.set_option('display.max_columns', None)

# Copy data for safety

df = df0

# Check for null values

print(df.isnull().any().any())

# Preview data

print(df.head())

2. Clean data

# Remove duplicate user_id entries

df = df.drop_duplicates(subset='user_id')

print('用户总数：', len(df['user_id']))

3. Compute registration distribution

# Truncate registration_time to day

register_date = []

for i in df['register_time']:
    date = i[5:10]
    register_date.append(date)
df['register_time'] = register_date

# Count registrations per day

df_register = df.groupby('register_time').size()

df_register.columns = ['日期', '注册人数']

print(df_register)

# Plot registration trend

plt.plot(df_register)

plt.grid(True)

pl.xticks(rotation=90)

font = FontProperties(fname='/System/Library/Fonts/PingFang.ttc')

plt.title('用户注册分布图', fontproperties=font)

plt.show()

4. Payment analysis

# Payment rate (paying users / active users)

df_pay_user = df[df['pay_price'] > 0]

pay_rate = df_pay_user['user_id'].count() / df_active_user['user_id'].count()

print('付费率：%.2f' % (pay_rate))

# ARPU (total payment / active users)

arpu = df_pay_user['pay_price'].sum() / df_active_user['user_id'].count()

print('ARPU:%.2f' % (arpu))

# ARPPU (total payment / paying users)

arppu = df_pay_user['pay_price'].sum() / df_pay_user['user_id'].count()

print('ARPPU:%.2f' % (arppu))

5. Payment behavior by level

df_user = df[['user_id', 'bd_stronghold_level', 'pay_price', 'pay_count']]
df_table = pd.pivot_table(df_user, index=['bd_stronghold_level'],
    values=['user_id', 'pay_price', 'pay_count'],
    aggfunc={'user_id':'count','pay_price':'sum','pay_count':'sum'})

df_stronghold_pay = pd.DataFrame(df_table.to_records())
# Calculate paying users per level
df_stronghold_pay['pay_num'] = df_user[(df_user['pay_price']>0)].groupby('bd_stronghold_level').user_id.count()
# Conversion rate per level
df_stronghold_pay['pay_rate'] = df_stronghold_pay['pay_num'] / df_stronghold_pay['user_id']
# Average payment per level
df_stronghold_pay['avg_pay_price'] = df_stronghold_pay['pay_price'] / df_stronghold_pay['user_id']
# Average payment count per level
df_stronghold_pay['avg_pay_count'] = df_stronghold_pay['pay_count'] / df_stronghold_pay['user_id']
# Rename columns
df_stronghold_pay.columns = ['要塞等级','总人数','总付费金额','总付费次数','付费人数','付费转化率','人均付费金额','人均付费次数']
df_stronghold_pay = df_stronghold_pay[['要塞等级','总人数','付费人数','付费转化率','总付费金额','人均付费金额','总付费次数','人均付费次数']]
df_stronghold_pay = df_stronghold_pay.round(2)
print(df_stronghold_pay)

6. Consumption habits of different player groups

# Define high‑value players (level >=10 and pay_price >=500) and normal players

df_eli_user = df[(df['pay_price']>=500) & (df['bd_stronghold_level']>=10)]
df_nor_user = df[(df['pay_price']<500) & (df['bd_stronghold_level']>10)]

# Average resource consumption for each group

wood_avg = [df_eli_user['wood_reduce_value'].mean(), df_nor_user['wood_reduce_value'].mean()]
stone_avg = [df_eli_user['stone_reduce_value'].mean(), df_nor_user['stone_reduce_value'].mean()]
ivory_avg = [df_eli_user['ivory_reduce_value'].mean(), df_nor_user['ivory_reduce_value'].mean()]
meat_avg = [df_eli_user['meat_reduce_value'].mean(), df_nor_user['meat_reduce_value'].mean()]
magic_avg = [df_eli_user['magic_reduce_value'].mean(), df_nor_user['magic_reduce_value'].mean()]
props_data = {'high_value_player':[wood_avg[0], stone_avg[0], ivory_avg[0], meat_avg[0], magic_avg[0]],
              'normal_player':[wood_avg[1], stone_avg[1], ivory_avg[1], meat_avg[1], magic_avg[1]]}

df_props = pd.DataFrame(props_data, index=['wood','stone','ivory','meat','magic']).round(2)
print(df_props)
<code># Plot resource consumption

df_props.plot(kind='bar', title='Props Reduce', grid=True, legend=True)
plt.show()

# Acceleration item consumption

general_avg = [df_eli_user['general_acceleration_reduce_value'].mean(), df_nor_user['general_acceleration_reduce_value'].mean()]
building_avg = [df_eli_user['building_acceleration_reduce_value'].mean(), df_nor_user['building_acceleration_reduce_value'].mean()]
research_avg = [df_eli_user['reaserch_acceleration_reduce_value'].mean(), df_nor_user['reaserch_acceleration_reduce_value'].mean()]
training_avg = [df_eli_user['training_acceleration_reduce_value'].mean(), df_nor_user['training_acceleration_reduce_value'].mean()]
training_avg = [df_eli_user['training_acceleration_reduce_value'].mean(), df_nor_user['training_acceleration_reduce_value'].mean()]
treatment_avg = [df_eli_user['treatment_acceleration_reduce_value'].mean(), df_nor_user['treatment_acceleration_reduce_value'].mean()]
acceleration_data = {'high_value_player':[general_avg[0], building_avg[0], research_avg[0], training_avg[0], treatment_avg[0]],
                    'normal_player':[general_avg[1], building_avg[1], research_avg[1], training_avg[1], treatment_avg[1]]}

df_acceleration = pd.DataFrame(acceleration_data, index=['general','building','researching','training','treatment']).round(2)
print(df_acceleration)
<code># Plot acceleration consumption

df_acceleration.plot(kind='bar', title='Acceleration Reduce', grid=True, legend=True)
plt.show()

Conclusion

1. The game has a large user base; new registrations are strongly influenced by events and version updates.

2. ARPU of 8.55 indicates high profitability.

3. Users reaching level 10 show a sharp increase in payment propensity, approaching 100% at level 13, but most users stay below level 10, making level‑up strategies critical.

4. High‑value players consume significantly more ivory and general acceleration items than normal players.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Python user behavior Game Analytics Pandas payment analysis

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.