Big Data 11 min read

How to Identify Top Bilibili Creators Using the IFL Model: A Data‑Driven Guide

This article presents a complete data‑analysis workflow that scrapes Bilibili video metrics from January 2019 to March 2020, cleans and preprocesses 50,130 records, and extends the classic RFM model into an IFL framework—calculating interaction, frequency and like rates—to score and rank up‑creators across multiple categories, with code and datasets provided for replication.

Python Crawling & Data Mining

Jul 2, 2020

How to Identify Top Bilibili Creators Using the IFL Model: A Data‑Driven Guide

Project Overview

This project analyzes Bilibili video data (January 2019 ~ March 2020) to discover high‑quality up‑creators. The analysis is based on an adjusted RFM model, renamed the IFL model, which better fits Bilibili’s characteristics.

Analysis Purpose

Identify videos with high quality and up‑creators worth following by examining metrics such as view count, coins, danmu, favorites, likes, shares, and comments.

Data Source

The dataset is collected from publicly available Bilibili information. It includes 50,130 rows of videos from the technology category that received more than 50 k views between 2019‑01 and 2020‑03, covering fields: partition name, author name, author ID, publish date, view count, coin count, danmu count, favorite count, like count, share count, and comment count.

Data Cleaning

Remove missing values and duplicate rows.

df = df.dropna()
df.info()
df = df.drop_duplicates()
df.info()

After cleaning, 19 rows were removed (remaining 50,111 rows), and 1,312 duplicate rows were removed (remaining 48,799 rows).

Extract Required Columns

df = df[[
    '分区', 'author', 'date', 'coins', 'danmu',
    'favorite', 'likes', 'replay', 'share', 'view'
]]
df.head()

Model Construction

The classic RFM model evaluates customer value using Recency, Frequency, and Monetary. Since RFM cannot assess video quality, an IFL model is introduced:

I (Interaction_rate) : average interaction per video (comments + danmu) relative to views.

F (Frequence) : average publishing interval; shorter intervals indicate more active creators.

L (Like_rate) : average like‑to‑view ratio, reflecting stable video quality.

Calculate I, F, L

For each partition, compute the metrics.

# I calculation
I = round(((danmu + replay) / view / count) * 100, 2)
# F calculation
F = round((last - early).dt.days / count, 0)
# L calculation
L = (likes + 2*coins + 3*favorite) / view * 100

Scoring

Each metric is binned and assigned a score:

# I score
IFL['I_SCORE'] = pd.cut(IFL['I'], bins=[0,0.03,0.06,0.11,1000], labels=[1,2,3,4], right=False).astype(float)
# F score (higher frequency gets lower score)
IFL['F_SCORE'] = pd.cut(IFL['F'], bins=[0,7,15,30,90,1000], labels=[5,4,3,2,1], right=False).astype(float)
# L score
IFL['L_SCORE'] = pd.cut(IFL['L'], bins=[0,5.39,9.07,15.58,1000], labels=[1,2,3,4], right=False).astype(float)

Customer Segmentation

Combine the binary flags of whether each score exceeds its mean to form a three‑digit code (I × 100 + F × 10 + L). This code maps to user types such as "High‑value up‑creator", "Potential up‑creator", etc.

IFL['人群数值'] = (
    IFL['I是否大于平均值']*100 +
    IFL['F是否大于平均值']*10 +
    IFL['L是否大于平均值']
)
IFL['人群类型'] = IFL['人群数值'].apply(transform_label)

Results

Distribution of user types and top‑15 rankings per partition are generated. Example for the "Science & Popular Science" partition:

Similar ranking tables are produced for "Social Science & Humanities", "Mechanical", "Wild Technology Association", "Star Sea", and "Automobile" partitions.

References

Data‑Driven RF Model in Data Not Lying

Crossin: "Unofficial Bilibili User Behavior Analysis Report"

GitHub: bilibili‑api

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data analysis RFM IFL model up‑creator ranking

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.