Big Data 20 min read

What Bilibili Documentary Data Reveals: A Deep Dive into Trends, Genres, and Viewer Engagement

This article walks through scraping Bilibili documentary data with Python, processing it using pandas, and visualizing regional distributions, genre comparisons, yearly trends, episode lengths, popularity metrics, and comment timing to uncover insights about Chinese, UK, and US documentary patterns.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
What Bilibili Documentary Data Reveals: A Deep Dive into Trends, Genres, and Viewer Engagement

Data Collection and Basic Processing

We first obtain the Bilibili documentary query API and the detailed season API, then use Python requests to fetch data, parse JSON, and store basic fields (title, year, season_id, media_id) into a pandas DataFrame.

'https://bangumi.bilibili.com/media/web_api/search/result?style_id={0}&producer_id={1}&year={2}&order=2&st=3&sort=0&page={3}&season_type=3&pagesize=20'.format(style_id,producer_id,year,page)

Similarly the season detail API is:

'https://bangumi.bilibili.com/view/web_api/season?season_id={0}'.format(season_id)

After retrieving all pages we create a DataFrame with columns name, years, season_id, media_id and save it to CSV.

data_all.to_csv('documentary_data_allinfo.csv', index=0)

Data Visualization

Regional Distribution

Using pandas groupby we count documentaries per region and plot a pie chart.

Regional distribution
Regional distribution

The top regions are Mainland China, the United Kingdom and the United States, accounting for about 75% of the total.

Genre Comparison

We extract genre information, count single and paired genres, and build a DataFrame for Mainland China, UK and US. A stacked bar chart shows the distribution of genres such as History, Society, Culture, Technology, Nature, etc.

Genre distribution
Genre distribution

Trend Over Years

A pivot table of documentary counts by year and region is plotted, showing that Chinese documentaries surged after 2015 while US and UK numbers grew steadily.

Yearly trend
Yearly trend

Episode Length Distribution

Histograms compare per‑episode duration for the three regions.

Episode length
Episode length

Season Total Length and Episode Count

Violin plots show the distribution of total season length and number of episodes across regions.

Season length
Season length

Popularity Analysis

We compute a weighted popularity score (coins × 10 + favorites × 5 + danmakus) / views and plot a bubble chart where bubble size reflects view count and color reflects the score.

Popularity bubble chart
Popularity bubble chart

Comment Timing

Using the comment API we extract the first comment timestamp for each episode and draw time‑series plots for selected series such as “国家宝藏” and “人生一串”.

Comment timeline
Comment timeline

These analyses reveal differences in release schedules, episode counts and viewer engagement between Chinese and Western documentaries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythondata analysisvisualizationBilibiliDocumentary
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.