How to Scrape QQ Music: Get Song IDs, Lyrics, and Top Comments with Python

This guide walks you through using Python's requests library to scrape QQ Music, showing how to retrieve a song’s ID, fetch its lyrics, and extract the top comments, complete with code snippets and step‑by‑step explanations.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape QQ Music: Get Song IDs, Lyrics, and Top Comments with Python

Project Goal

We implement a Python script that obtains a QQ Music song’s ID, then uses that ID to fetch the song’s lyrics and the top 15 user comments.

Required Libraries

The main libraries used are requests , json , and html .

Implementation Steps

1. Inspect the XHR requests for a sample song (e.g., "泡沫") to locate the endpoints that return comments and lyrics.

2. Sort the XHR entries by size; the first red box contains comments, the second contains lyrics.

3. Examine the request parameters (Parms) in the headers to identify the key that varies between songs.

4. The differing parameter is topid, which represents the song’s unique ID.

5. Confirm that musicid = topid is the identifier needed for subsequent requests.

6. Workflow: input a song name → generate url_1 to obtain the song ID → use the ID to request lyrics ( url_2) and comments ( url_3).

Code: Retrieve Song ID

import requests, html, json
url_1 = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
search = input('请输入需要查询歌词的歌曲名称:')
params = {
    'ct': '24', 'qqmusic_ver': '1298', 'new_json': '1', 'remoteplace': 'txt.yqq.song',
    'searchid': '71600317520820180', 't': '0', 'aggr': '1', 'cr': '1', 'catZhida': '1',
    'lossless': '0', 'flag_qc': '0', 'p': '1', 'n': '10', 'w': search, 'g_tk': '5381',
    'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8',
    'outCharset': 'utf-8', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0'
}
res_music = requests.get(url_1, headers=headers, params=params)
json_music = res_music.json()
id = json_music['data']['song']['list'][0]['id']
print(id)

Code: Retrieve Lyrics

url_2 = 'https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
params = {
    'nobase64': '1', 'musicid': id, '-': 'jsonp1', 'g_tk': '5381',
    'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8',
    'outCharset': 'utf-8', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0'
}
res_music = requests.get(url_2, headers=headers, params=params)
js = res_music.json()
lyric = js['lyric']
lyric_html = html.unescape(lyric)
with open(search + '歌词.txt', 'a', encoding='utf-8') as f:
    f.write(lyric_html)
input('下载成功,按回车键退出!')

Code: Retrieve Comments

url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
params = {
    'g_tk_new_20200303': '5381', 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0',
    'format': 'json', 'inCharset': 'utf8', 'outCharset': 'GB2312', 'notice': '0',
    'platform': 'yqq.json', 'needNewCode': '0', 'cid': '205360772', 'reqtype': '2',
    'biztype': '1', 'topid': id, 'cmd': '8', 'needmusiccrit': '0', 'pagenum': '0',
    'pagesize': '25', 'lasthotcommentid': '', 'domain': 'qq.com', 'ct': '24', 'cv': '10101010'
}
res_music = requests.get(url_3, headers=headers, params=params)
js = res_music.json()
comments = js['hot_comment']['commentlist']
with open(search + '评论.txt', 'a', encoding='utf-8') as f:
    for c in comments:
        f.write(c['rootcommentcontent'] + '
——————————————————————————————————
')
input('下载成功,按回车键退出!')

Conclusion

The second project adds an extra step of obtaining the song ID before fetching lyrics and comments. It demonstrates typical XHR‑based web scraping: sending HTTP requests, parsing JSON responses, handling HTML entities, and saving results to text files. Future work will extend the script to collect more comments and generate a word‑cloud visualization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythoncommentsqq musiclyrics
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.