How to Identify Top Bilibili Creators Using the IFL Model: A Data‑Driven Guide
This article presents a complete data‑analysis workflow that scrapes Bilibili video metrics from January 2019 to March 2020, cleans and preprocesses 50,130 records, and extends the classic RFM model into an IFL framework—calculating interaction, frequency and like rates—to score and rank up‑creators across multiple categories, with code and datasets provided for replication.
Project Overview
This project analyzes Bilibili video data (January 2019 ~ March 2020) to discover high‑quality up‑creators. The analysis is based on an adjusted RFM model, renamed the IFL model, which better fits Bilibili’s characteristics.
Analysis Purpose
Identify videos with high quality and up‑creators worth following by examining metrics such as view count, coins, danmu, favorites, likes, shares, and comments.
Data Source
The dataset is collected from publicly available Bilibili information. It includes 50,130 rows of videos from the technology category that received more than 50 k views between 2019‑01 and 2020‑03, covering fields: partition name, author name, author ID, publish date, view count, coin count, danmu count, favorite count, like count, share count, and comment count.
Data Cleaning
Remove missing values and duplicate rows.
df = df.dropna()
df.info()
df = df.drop_duplicates()
df.info()After cleaning, 19 rows were removed (remaining 50,111 rows), and 1,312 duplicate rows were removed (remaining 48,799 rows).
Extract Required Columns
df = df[[
'分区', 'author', 'date', 'coins', 'danmu',
'favorite', 'likes', 'replay', 'share', 'view'
]]
df.head()Model Construction
The classic RFM model evaluates customer value using Recency, Frequency, and Monetary. Since RFM cannot assess video quality, an IFL model is introduced:
I (Interaction_rate) : average interaction per video (comments + danmu) relative to views.
F (Frequence) : average publishing interval; shorter intervals indicate more active creators.
L (Like_rate) : average like‑to‑view ratio, reflecting stable video quality.
Calculate I, F, L
For each partition, compute the metrics.
# I calculation
I = round(((danmu + replay) / view / count) * 100, 2)
# F calculation
F = round((last - early).dt.days / count, 0)
# L calculation
L = (likes + 2*coins + 3*favorite) / view * 100Scoring
Each metric is binned and assigned a score:
# I score
IFL['I_SCORE'] = pd.cut(IFL['I'], bins=[0,0.03,0.06,0.11,1000], labels=[1,2,3,4], right=False).astype(float)
# F score (higher frequency gets lower score)
IFL['F_SCORE'] = pd.cut(IFL['F'], bins=[0,7,15,30,90,1000], labels=[5,4,3,2,1], right=False).astype(float)
# L score
IFL['L_SCORE'] = pd.cut(IFL['L'], bins=[0,5.39,9.07,15.58,1000], labels=[1,2,3,4], right=False).astype(float)Customer Segmentation
Combine the binary flags of whether each score exceeds its mean to form a three‑digit code (I × 100 + F × 10 + L). This code maps to user types such as "High‑value up‑creator", "Potential up‑creator", etc.
IFL['人群数值'] = (
IFL['I是否大于平均值']*100 +
IFL['F是否大于平均值']*10 +
IFL['L是否大于平均值']
)
IFL['人群类型'] = IFL['人群数值'].apply(transform_label)Results
Distribution of user types and top‑15 rankings per partition are generated. Example for the "Science & Popular Science" partition:
Similar ranking tables are produced for "Social Science & Humanities", "Mechanical", "Wild Technology Association", "Star Sea", and "Automobile" partitions.
References
Data‑Driven RF Model in Data Not Lying
Crossin: "Unofficial Bilibili User Behavior Analysis Report"
GitHub: bilibili‑api
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
