Backend Development 14 min read

How to Scrape Bilibili Comments and Analyze Them with ChatGPT

This article walks through discovering Bilibili's comment API, programmatically fetching paginated JSON data, converting it into Java POJOs, storing and sorting the comments, and finally feeding the top entries to ChatGPT for automated sentiment and content analysis.

Sohu Tech Products

Aug 23, 2023

How to Scrape Bilibili Comments and Analyze Them with ChatGPT

Finding the API

By scrolling the comment section of a Bilibili video and monitoring network requests, an endpoint that returns JSON was identified. The request URL is constant while the response changes, indicating pagination.

https://api.bilibili.com/x/v2/reply/main?jsonp=jsonp&type=1&oid=956733745&mode=3&plat=1&next=1

oid

is the old numeric video ID; next controls pagination.

Fetching and Converting JSON

A loop increments next to retrieve all comment pages. The JSON is deeply nested. To map it to Java objects, the online tool https://www.bejson.com/json2javapojo/new/ was used, generating many POJO classes; only three (Replies, Member, Content) are needed for further analysis.

Storing and Sorting Comments

Deserialized comment objects are persisted to a local database. Records are automatically sorted by the like count, making it easy to extract the most popular comments.

Feeding Top Comments to ChatGPT

The top 100 comments (sorted by likes) are sent to ChatGPT in batches. A system prompt such as the following is used to steer the model:

System: You are a superconductivity expert. Analyze the following comments and provide concise insights.

ChatGPT returns coherent summaries and highlights key themes such as scientific rigor, accuracy, and transparency.

Overall, as a superconductivity expert, I would emphasize scientific rigor, accuracy, and transparency...

Example Analyses

Individual high‑like comments are queried, and ChatGPT provides interpretations, e.g., explaining metaphors that compare superconductors to fragile butterfly wings.

Reusable Pipeline

The workflow consists of the following steps:

Discover the undocumented Bilibili comment API.

Automate pagination by varying the next parameter.

Download JSON responses for each page.

Convert the JSON to typed Java objects (using a POJO generator).

Persist the objects to a database and sort them by like count.

Feed selected comments to a large language model for summarization or domain‑specific analysis.

Changing the videoId (or oid) allows the same pipeline to be applied to any Bilibili video, turning raw comment streams into structured, actionable insights.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java data analysis ChatGPT API Web Scraping Bilibili

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.