Build a Python QQ Music Scraper: From Song Lists to Word Clouds
This tutorial walks you through creating a Python-powered QQ Music scraper that retrieves song details, lyrics, and comments, generates a word‑cloud visualization, and packages the functionality into a menu‑driven command‑line tool, complete with code snippets and troubleshooting tips.
Project Goal
The project combines three earlier scraping scripts into a single menu‑driven Python program that can fetch a specified artist's song list, retrieve lyrics and hot comments for a chosen track, and generate a word‑cloud image from the collected comments.
Required Libraries
Core dependencies are requests , openpyxl , html , json , wordcloud , and jieba . For custom word‑cloud backgrounds you also need numpy and Pillow (install via pip install pillow). To create a standalone executable, use pyinstaller -F.
Implementation
A QQ class provides a menu method offering five options: get song info, get lyrics, get comments, generate a word cloud, and exit.
class QQ():
def menu(self):
print('欢迎使用QQ音乐爬虫系统,以下是功能菜单,请选择.
')
while True:
try:
print('功能菜单
1.获取指定歌手的歌曲信息
2.获取指定歌曲歌词
3.获取指定歌曲评论
4.生成词云图
5.退出系统
')
choice = int(input('请输入数字选择对应的功能:'))
if choice == 1:
self.get_info()
elif choice == 2:
self.get_id()
self.get_lyric()
elif choice == 3:
self.get_id()
self.get_comment()
elif choice == 4:
self.wordcloud()
elif choice == 5:
print('感谢使用!')
break
else:
print('输入错误,请重新输入.
')
except:
print('输入错误,请重新输入.
')get_info() creates an Excel workbook, queries QQ Music's search API for a given artist and page count, extracts song name, album, and playback link, and writes the data to the sheet before saving.
def get_info(self):
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = 'song'
sheet['A1'] = '歌曲名'
sheet['B1'] = '所属专辑'
sheet['C1'] = '播放链接'
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
name = input('请输入要查询的歌手姓名:')
page = int(input('请输入需要查询的歌曲页数:'))
for x in range(page):
params = {
'ct':'24','qqmusic_ver':'1298','new_json':'1','remoteplace':'sizer.yqq.song_next',
'searchid':'64405487069162918','t':'0','aggr':'1','cr':'1','catZhida':'1','lossless':'0',
'flag_qc':'0','p':str(x+1),'n':'20','w':name,'g_tk':'5381','loginUin':'0','hostUin':'0',
'format':'json','inCharset':'utf8','outCharset':'utf-8','notice':'0','platform':'yqq.json','needNewCode':'0'
}
res = requests.get(url, params=params)
data = res.json()
for music in data['data']['song']['list']:
song_name = music['name']
album = music['album']['name']
link = 'https://y.qq.com/n/yqq/song/' + str(music['mid']) + '.html'
sheet.append([song_name, album, link])
wb.save(name + '个人单曲排行前' + str(page*20) + '清单.xlsx')
print('下载成功!
')get_id() and get_lyric() locate a song's ID via the search API, then request its lyrics, decode HTML entities, and save the result to a .txt file. The request headers must include origin and referer to avoid being blocked.
def get_id(self):
self.i = input('请输入歌曲名:')
url_1 = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
headers = {'user-agent':'Mozilla/5.0 ...'}
params = {..., 'w':self.i, ...}
res_music = requests.get(url_1, headers=headers, params=params)
json_music = res_music.json()
self.id = json_music['data']['song']['list'][0]['id']
def get_lyric(self):
url_2 = 'https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg'
headers = {'origin':'https://y.qq.com','referer':'https://y.qq.com/n/yqq/song/001qvvgF38HVc4.html','user-agent':'Mozilla/5.0 ...'}
params = {'nobase64':'1','musicid':self.id,'-':'jsonp1','g_tk':'5381',...}
res_music = requests.get(url_2, headers=headers, params=params)
lyric = res_music.json()['lyric']
lyric_html = html.unescape(lyric)
with open(self.i + '歌词.txt','a',encoding='utf-8') as f:
f.writelines(lyric_html)
print('下载成功!
')get_comment() iterates over comment pages, extracts each comment's text, and writes them to a .txt file, separating entries with a line of dashes.
def get_comment(self):
page = input('请输入要下载的评论页数:')
url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
headers = {'user-agent':'Mozilla/5.0 ...'}
f2 = open(self.i + '评论.txt','a',encoding='utf-8')
for n in range(int(page)):
params = {...,'topid':self.id,'pagenum':n,...}
js_2 = requests.get(url_3, headers=headers, params=params).json()
for i in js_2['comment']['commentlist']:
comment = i['rootcommentcontent'] + '
——————————————————————————————————
'
f2.writelines(comment)
f2.close()
print('下载成功!
')wordcloud() reads a text file, uses jieba to segment Chinese words, applies a mask image (e.g., a heart shape), and generates a word‑cloud PNG.
def wordcloud(self):
self.name = input('请输入要生成词云图的文件名称:')
def cut(text):
return " ".join(jieba.cut(text))
with open(self.name + '.txt',encoding='utf-8') as file:
text = cut(file.read())
mask_pic = numpy.array(Image.open('心.png'))
wc = WordCloud(font_path='C:/Windows/Fonts/simfang.ttf', collocations=False,
max_words=100, min_font_size=10, max_font_size=500, mask=mask_pic)
wc.generate(text)
wc.to_file(self.name + '云词图.png')
print('生成成功!
')Finally, the script is instantiated and the menu launched:
qq = QQ()
qq.menu()Sample output screenshots are shown below:
When packaging the script into an executable with pyinstaller -F, the word‑cloud step may cause errors; commenting out the word‑cloud related imports and function allows successful packaging, albeit without the visualization feature.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
