How to Scrape Bilibili Comments and Analyze Them with ChatGPT
This article walks through discovering Bilibili's comment API, programmatically fetching paginated JSON data, converting it into Java POJOs, storing and sorting the comments, and finally feeding the top entries to ChatGPT for automated sentiment and content analysis.
Finding the API
By scrolling the comment section of a Bilibili video and monitoring network requests, an endpoint that returns JSON was identified. The request URL is constant while the response changes, indicating pagination.
https://api.bilibili.com/x/v2/reply/main?jsonp=jsonp&type=1&oid=956733745&mode=3&plat=1&next=1
oidis the old numeric video ID; next controls pagination.
Fetching and Converting JSON
A loop increments next to retrieve all comment pages. The JSON is deeply nested. To map it to Java objects, the online tool https://www.bejson.com/json2javapojo/new/ was used, generating many POJO classes; only three (Replies, Member, Content) are needed for further analysis.
Storing and Sorting Comments
Deserialized comment objects are persisted to a local database. Records are automatically sorted by the like count, making it easy to extract the most popular comments.
Feeding Top Comments to ChatGPT
The top 100 comments (sorted by likes) are sent to ChatGPT in batches. A system prompt such as the following is used to steer the model:
System: You are a superconductivity expert. Analyze the following comments and provide concise insights.ChatGPT returns coherent summaries and highlights key themes such as scientific rigor, accuracy, and transparency.
Overall, as a superconductivity expert, I would emphasize scientific rigor, accuracy, and transparency...
Example Analyses
Individual high‑like comments are queried, and ChatGPT provides interpretations, e.g., explaining metaphors that compare superconductors to fragile butterfly wings.
Reusable Pipeline
The workflow consists of the following steps:
Discover the undocumented Bilibili comment API.
Automate pagination by varying the next parameter.
Download JSON responses for each page.
Convert the JSON to typed Java objects (using a POJO generator).
Persist the objects to a database and sort them by like count.
Feed selected comments to a large language model for summarization or domain‑specific analysis.
Changing the videoId (or oid) allows the same pipeline to be applied to any Bilibili video, turning raw comment streams into structured, actionable insights.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
