Build a Million‑Follower Bilibili Nickname Generator with Python Scraping
This article demonstrates how to crawl Bilibili creator data, analyze fan counts, categories, gender and video statistics with Python and pandas, and then create a nickname generator for aspiring million‑follower up‑hosts using both Python and JavaScript.
The author shares a method to crawl Bilibili up data, analyze millions of creators, and build a nickname generator for aspiring million‑follower creators.
Source: CSDN – Author: 小小明 – Original article
Bilibili up information crawling
Directly scraping Bilibili homepage is inconvenient, so two third‑party data sites are used: 火烧云数据 and 小小数据. After logging in, the API URLs are copied. pip install filestools -U Run the conversion tool to obtain ready‑to‑use Python crawling code. curl2py The generated code is copied to the editor; for those who dislike command‑line, replace the copied curl command with xxx in the script below.
from curl2py.curlParseTool import curlCmdGenPyScript
curl_cmd = """xxx"""
output = curlCmdGenPyScript(curl_cmd)
print(output)Data analysis
Data reading and preprocessing
import pandas as pd
names = ["名称","性别","签名","视频数量","粉丝数","播放数","点赞数","总充电人数","月充电人数","生日","category1","category2","tags"]
df = pd.read_csv("b站up主粉丝量top10万.csv", usecols=[2,3,5]+list(range(9,16))+[22,23,24], header=0, names=names, low_memory=False)
df.drop_duplicates(inplace=True)
df.sort_values("粉丝数", ascending=False, inplace=True)
dfThe dataset contains 100 000 rows; official Bilibili accounts are removed, leaving 99 955 entries.
Category distribution
Counting the category1 field shows that Life and Game are the most common categories among creators with over 10 000 fans.
Gender differences
Overall gender counts: 65 900 secret, 20 452 male, 13 648 female. Male creators are about 50 % more than female, and the male‑to‑female ratio increases with higher fan tiers.
Video count distribution
The average number of videos per creator is 258, with a maximum of 180 033; many creators have zero videos yet still have fans.
df.视频数量.describe()
# count 59167.0
# mean 258.36
# std 1379.66
# min 0.0
# 25% 38.0
# 50% 89.0
# 75% 213.0
# max 180033.0Birthday distribution
Birth month analysis reveals a large spike in January birthdays, possibly due to the default selection in Bilibili’s profile settings.
Million‑follower nickname generator
Filter creators with fan count ≥ 100 000 (million‑level after conversion) – 658 entries. Name length distribution shows 4‑character names are most common.
name_size = df.名称.apply(len)
name_size.value_counts().head(10)
# 4 158
# 5 131
# 3 88
# 7 70
# 6 70Extract words longer than one character using jieba, yielding 1 068 unique tokens.
import jieba
names = df.名称.apply(jieba.lcut).explode()
names = names[names.apply(len) > 1].unique()
print(names.shape[0], names)
# 1068 ['罗翔' '刑法' '番茄' ...]Generate a short nickname by randomly picking two tokens (≈4 characters).
"".join(np.random.choice(names, 2))
# Example output: 张逗麦克JavaScript version for users without Python:
var items = ['罗翔','刑法','番茄','敬汉卿',/* many more */];
items[Math.floor(Math.random()*items.length)] + items[Math.floor(Math.random()*items.length)];Running the script in a browser console produces random, catchy nicknames such as “张逗麦克”.
Conclusion
The analysis provides actionable insights into Bilibili creator demographics and demonstrates a practical pipeline—from crawling data to generating personalized nicknames—useful for anyone aiming to become a high‑fan‑count up‑host.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
