How to Scrape QQ Music Data and Create Word Clouds with Python
This tutorial walks through using Python to fetch QQ Music song rankings, lyrics, and hot comments, then iterates through comment pages to collect data and generate a word‑cloud visualization, illustrating the required libraries, code snippets, and key parameters.
Project Goal
We demonstrate how to use Python to fetch QQ Music data: first obtaining song rankings, then retrieving lyrics and hot comments, and finally extracting more comments to generate a word‑cloud visualization.
Required Libraries
The project relies on requests, json, wordcloud, jieba, and optionally numpy and PIL for custom background images.
Implementation
1. Retrieve hot comments of a song.
def get_comment(i):
url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
headers = {'user-agent':'Mozilla/5.0 ...'}
params = {...}
res_music = requests.get(url_3, headers=headers, params=params)
js_2 = res_music.json()
comments = js_2['hot_comment']['commentlist']
f2 = open(i+'评论.txt','a',encoding='utf-8')
for i in comments:
comment = i['rootcommentcontent'] + '
——————————————————————————————————
'
f2.writelines(comment)
f2.close()2. Inspect network parameters to locate pagination fields (pagenum, cmd, pagesize) and use them to iterate through pages.
3. Loop over 20 pages, storing comments to a text file.
def get_comment(i):
url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
headers = {'user-agent':'Mozilla/5.0 ...'}
f2 = open(i+'评论.txt','a',encoding='utf-8')
for n in range(20):
params = {'g_tk_new_20200303':'5381', ...,'pagenum':n,'pagesize':'15', ...}
res_music = requests.get(url_3, headers=headers, params=params)
js_2 = res_music.json()
comments = js_2['comment']['commentlist']
for i in comments:
comment = i['rootcommentcontent'] + '
——————————————————————————————————
'
f2.writelines(comment)
f2.close()
input('下载成功,按回车键退出!')4. Generate a word cloud from the collected comments.
from wordcloud import WordCloud
import jieba, numpy
import PIL.Image as Image
def cut(text):
return " ".join(jieba.cut(text))
with open("句号评论.txt",encoding="utf-8") as file:
text = cut(file.read())
mask_pic = numpy.array(Image.open("心.png"))
wc = WordCloud(font_path="C:/Windows/Fonts/simfang.ttf",
collocations=False,
max_words=100,
min_font_size=10,
max_font_size=500,
mask=mask_pic).generate(text)
wc.to_file('云词图.png')Images illustrate the network panel, parameter changes, and final word‑cloud results.
Summary
The third project extends the previous one by discovering pagination parameters to crawl more comments, storing them, and visualizing the text with a word cloud. It also notes that the approach can be adapted to other data sources such as CSV or Excel, and suggests using Scrapy for larger‑scale crawling.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
