Big Data 5 min read

Scrape and Analyze Bilibili’s “马保国” Videos with Python – A Complete Guide

This tutorial shows how to use Python to fetch data from Bilibili’s “马保国” channel via its public API, extract video metadata, clean and visualize 14,000 records, and generate insights such as top‑viewed videos and a comment word cloud.

Python Crawling & Data Mining

Dec 19, 2020

Scrape and Analyze Bilibili’s “马保国” Videos with Python – A Complete Guide

The article presents a Python web‑scraping project targeting the Bilibili "马保国" channel, which hosts many meme videos.

The API endpoint used is:

https://api.bilibili.com/x/web-interface/web/channel/multiple/list?channel_id=3503796&sort_type=hot&page_size=30

The offset parameter is obtained from the JSON response of the previous request.

A concise scraping function is provided:

def get_data(url, headers):
    data_m = pd.DataFrame(columns=['id','name','view_count','like_count','duration','author_name','author_id','bvid'])
    html = requests.get(url, headers=headers).content
    data = json.loads(html.decode('utf-8'))
    offset = data['data']['offset']
    print(offset)
    for j in range(30):
        data_m = data_m.append({
            'id': data['data']['list'][j]['id'],
            'name': data['data']['list'][j]['name'],
            'view_count': data['data']['list'][j]['view_count'],
            'like_count': data['data']['list'][j]['like_count'],
            'duration': data['data']['list'][j]['duration'],
            'author_name': data['data']['list'][j]['author_name'],
            'author_id': data['data']['list'][j]['author_id'],
            'bvid': data['data']['list'][j]['bvid']
        }, ignore_index=True)
    return offset, data_m

Running this code quickly retrieves about 14,000 video records. After basic cleaning (e.g., converting view counts expressed in ten‑thousands), the data is visualized with a scatter plot of view count versus like count.

The analysis reveals that the most viewed videos have several million views, while the most liked video is a meme clip titled "武林高手" from a popular Bilibili creator.

To explore audience comments, a word‑cloud is generated using the stylecloud library:

# Generate word cloud
stylecloud.gen_stylecloud(
    text=' '.join(text1),
    collocations=False,
    font_path=r'C:\Windows\Fonts\msyh.ttc',
    icon_name='fas fa-play-circle',
    size=653,
    output_name='马保国词云图.png'
)
Image(filename='马保国词云图.png')

The resulting word cloud highlights frequent phrases such as "耗子尾汁" and other meme terms.

Overall, the tutorial demonstrates how to obtain, process, and visualize large‑scale video data from Bilibili using Python, providing a practical example of data mining and visualization techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Python Web Scraping Bilibili

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.