How to Scrape QQ Music: Get Song IDs, Lyrics, and Top Comments with Python
This guide walks you through using Python's requests library to scrape QQ Music, showing how to retrieve a song’s ID, fetch its lyrics, and extract the top comments, complete with code snippets and step‑by‑step explanations.
Project Goal
We implement a Python script that obtains a QQ Music song’s ID, then uses that ID to fetch the song’s lyrics and the top 15 user comments.
Required Libraries
The main libraries used are requests , json , and html .
Implementation Steps
1. Inspect the XHR requests for a sample song (e.g., "泡沫") to locate the endpoints that return comments and lyrics.
2. Sort the XHR entries by size; the first red box contains comments, the second contains lyrics.
3. Examine the request parameters (Parms) in the headers to identify the key that varies between songs.
4. The differing parameter is topid, which represents the song’s unique ID.
5. Confirm that musicid = topid is the identifier needed for subsequent requests.
6. Workflow: input a song name → generate url_1 to obtain the song ID → use the ID to request lyrics ( url_2) and comments ( url_3).
Code: Retrieve Song ID
import requests, html, json
url_1 = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
search = input('请输入需要查询歌词的歌曲名称:')
params = {
'ct': '24', 'qqmusic_ver': '1298', 'new_json': '1', 'remoteplace': 'txt.yqq.song',
'searchid': '71600317520820180', 't': '0', 'aggr': '1', 'cr': '1', 'catZhida': '1',
'lossless': '0', 'flag_qc': '0', 'p': '1', 'n': '10', 'w': search, 'g_tk': '5381',
'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8',
'outCharset': 'utf-8', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0'
}
res_music = requests.get(url_1, headers=headers, params=params)
json_music = res_music.json()
id = json_music['data']['song']['list'][0]['id']
print(id)Code: Retrieve Lyrics
url_2 = 'https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
params = {
'nobase64': '1', 'musicid': id, '-': 'jsonp1', 'g_tk': '5381',
'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8',
'outCharset': 'utf-8', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0'
}
res_music = requests.get(url_2, headers=headers, params=params)
js = res_music.json()
lyric = js['lyric']
lyric_html = html.unescape(lyric)
with open(search + '歌词.txt', 'a', encoding='utf-8') as f:
f.write(lyric_html)
input('下载成功,按回车键退出!')Code: Retrieve Comments
url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
params = {
'g_tk_new_20200303': '5381', 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0',
'format': 'json', 'inCharset': 'utf8', 'outCharset': 'GB2312', 'notice': '0',
'platform': 'yqq.json', 'needNewCode': '0', 'cid': '205360772', 'reqtype': '2',
'biztype': '1', 'topid': id, 'cmd': '8', 'needmusiccrit': '0', 'pagenum': '0',
'pagesize': '25', 'lasthotcommentid': '', 'domain': 'qq.com', 'ct': '24', 'cv': '10101010'
}
res_music = requests.get(url_3, headers=headers, params=params)
js = res_music.json()
comments = js['hot_comment']['commentlist']
with open(search + '评论.txt', 'a', encoding='utf-8') as f:
for c in comments:
f.write(c['rootcommentcontent'] + '
——————————————————————————————————
')
input('下载成功,按回车键退出!')Conclusion
The second project adds an extra step of obtaining the song ID before fetching lyrics and comments. It demonstrates typical XHR‑based web scraping: sending HTTP requests, parsing JSON responses, handling HTML entities, and saving results to text files. Future work will extend the script to collect more comments and generate a word‑cloud visualization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
