Scrape and Analyze Bilibili’s “马保国” Videos with Python – A Complete Guide
This tutorial shows how to use Python to fetch data from Bilibili’s “马保国” channel via its public API, extract video metadata, clean and visualize 14,000 records, and generate insights such as top‑viewed videos and a comment word cloud.
The article presents a Python web‑scraping project targeting the Bilibili "马保国" channel, which hosts many meme videos.
The API endpoint used is:
https://api.bilibili.com/x/web-interface/web/channel/multiple/list?channel_id=3503796&sort_type=hot&page_size=30The offset parameter is obtained from the JSON response of the previous request.
A concise scraping function is provided:
def get_data(url, headers):
data_m = pd.DataFrame(columns=['id','name','view_count','like_count','duration','author_name','author_id','bvid'])
html = requests.get(url, headers=headers).content
data = json.loads(html.decode('utf-8'))
offset = data['data']['offset']
print(offset)
for j in range(30):
data_m = data_m.append({
'id': data['data']['list'][j]['id'],
'name': data['data']['list'][j]['name'],
'view_count': data['data']['list'][j]['view_count'],
'like_count': data['data']['list'][j]['like_count'],
'duration': data['data']['list'][j]['duration'],
'author_name': data['data']['list'][j]['author_name'],
'author_id': data['data']['list'][j]['author_id'],
'bvid': data['data']['list'][j]['bvid']
}, ignore_index=True)
return offset, data_mRunning this code quickly retrieves about 14,000 video records. After basic cleaning (e.g., converting view counts expressed in ten‑thousands), the data is visualized with a scatter plot of view count versus like count.
The analysis reveals that the most viewed videos have several million views, while the most liked video is a meme clip titled "武 林 高 手" from a popular Bilibili creator.
To explore audience comments, a word‑cloud is generated using the stylecloud library:
# Generate word cloud
stylecloud.gen_stylecloud(
text=' '.join(text1),
collocations=False,
font_path=r'C:\Windows\Fonts\msyh.ttc',
icon_name='fas fa-play-circle',
size=653,
output_name='马保国词云图.png'
)
Image(filename='马保国词云图.png')The resulting word cloud highlights frequent phrases such as "耗子尾汁" and other meme terms.
Overall, the tutorial demonstrates how to obtain, process, and visualize large‑scale video data from Bilibili using Python, providing a practical example of data mining and visualization techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
