What Bilibili Documentary Data Reveals: A Deep Dive into Trends, Genres, and Viewer Engagement
This article walks through scraping Bilibili documentary data with Python, processing it using pandas, and visualizing regional distributions, genre comparisons, yearly trends, episode lengths, popularity metrics, and comment timing to uncover insights about Chinese, UK, and US documentary patterns.
Data Collection and Basic Processing
We first obtain the Bilibili documentary query API and the detailed season API, then use Python requests to fetch data, parse JSON, and store basic fields (title, year, season_id, media_id) into a pandas DataFrame.
'https://bangumi.bilibili.com/media/web_api/search/result?style_id={0}&producer_id={1}&year={2}&order=2&st=3&sort=0&page={3}&season_type=3&pagesize=20'.format(style_id,producer_id,year,page)Similarly the season detail API is:
'https://bangumi.bilibili.com/view/web_api/season?season_id={0}'.format(season_id)After retrieving all pages we create a DataFrame with columns name, years, season_id, media_id and save it to CSV.
data_all.to_csv('documentary_data_allinfo.csv', index=0)Data Visualization
Regional Distribution
Using pandas groupby we count documentaries per region and plot a pie chart.
The top regions are Mainland China, the United Kingdom and the United States, accounting for about 75% of the total.
Genre Comparison
We extract genre information, count single and paired genres, and build a DataFrame for Mainland China, UK and US. A stacked bar chart shows the distribution of genres such as History, Society, Culture, Technology, Nature, etc.
Trend Over Years
A pivot table of documentary counts by year and region is plotted, showing that Chinese documentaries surged after 2015 while US and UK numbers grew steadily.
Episode Length Distribution
Histograms compare per‑episode duration for the three regions.
Season Total Length and Episode Count
Violin plots show the distribution of total season length and number of episodes across regions.
Popularity Analysis
We compute a weighted popularity score (coins × 10 + favorites × 5 + danmakus) / views and plot a bubble chart where bubble size reflects view count and color reflects the score.
Comment Timing
Using the comment API we extract the first comment timestamp for each episode and draw time‑series plots for selected series such as “国家宝藏” and “人生一串”.
These analyses reveal differences in release schedules, episode counts and viewer engagement between Chinese and Western documentaries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
