Big Data 19 min read

Data Analysis and Visualization of Bilibili Documentary Metadata

This article demonstrates how to collect, process, and visualize Bilibili documentary metadata using Python APIs, pandas, and various plotting libraries, revealing insights into regional distribution, genre trends, episode lengths, popularity metrics, and comment dynamics across Chinese, British, and American documentary collections.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Data Analysis and Visualization of Bilibili Documentary Metadata

We first obtain the Bilibili documentary index page, inspect network requests, and identify the search API endpoint

https://bangumi.bilibili.com/media/web_api/search/result?...&page={3}&pagesize=20

and the season detail API

https://bangumi.bilibili.com/view/web_api/season?season_id={0}

.

Using these APIs we crawl basic information (title, year, season_id, media_id) into a list, convert it to a pandas DataFrame, and then request detailed fields such as danmakus, favorites, views, coins, area, episodes_duration, style, and episodes_aid for each season.

The collected data are saved to documentary_data_allinfo.csv and later re‑loaded for analysis.

Visualization steps include:

Region distribution via a pie chart of documentary counts per area.

Genre comparison for China, the UK, and the US using stacked bar charts and a co‑occurrence network graph.

Yearly documentary count trends across the three regions.

Episode length and season total duration distributions using histograms, violin plots, and 2‑D kernel density estimates.

Popularity analysis with a bubble chart that combines views, danmakus, coins, and favorites into a weighted score.

Comment‑time line plots for selected series, showing weekly comment dynamics.

Overall the analysis reveals that Chinese documentaries are dominated by history, society, and humanities topics with shorter episodes, while British and American productions feature more technology and nature content with longer episodes; popularity correlates strongly with coin and danmaku metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonBilibiliMatplotlibSeaborn
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.