Backend Development 8 min read

How to Scrape QQ Music Data and Create Word Clouds with Python

This tutorial walks through using Python to fetch QQ Music song rankings, lyrics, and hot comments, then iterates through comment pages to collect data and generate a word‑cloud visualization, illustrating the required libraries, code snippets, and key parameters.

Python Crawling & Data Mining

Apr 4, 2020

How to Scrape QQ Music Data and Create Word Clouds with Python

Project Goal

We demonstrate how to use Python to fetch QQ Music data: first obtaining song rankings, then retrieving lyrics and hot comments, and finally extracting more comments to generate a word‑cloud visualization.

Required Libraries

The project relies on requests, json, wordcloud, jieba, and optionally numpy and PIL for custom background images.

Implementation

1. Retrieve hot comments of a song.

def get_comment(i):
    url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
    headers = {'user-agent':'Mozilla/5.0 ...'}
    params = {...}
    res_music = requests.get(url_3, headers=headers, params=params)
    js_2 = res_music.json()
    comments = js_2['hot_comment']['commentlist']
    f2 = open(i+'评论.txt','a',encoding='utf-8')
    for i in comments:
        comment = i['rootcommentcontent'] + '
——————————————————————————————————
'
        f2.writelines(comment)
    f2.close()

2. Inspect network parameters to locate pagination fields (pagenum, cmd, pagesize) and use them to iterate through pages.

3. Loop over 20 pages, storing comments to a text file.

def get_comment(i):
    url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
    headers = {'user-agent':'Mozilla/5.0 ...'}
    f2 = open(i+'评论.txt','a',encoding='utf-8')
    for n in range(20):
        params = {'g_tk_new_20200303':'5381', ...,'pagenum':n,'pagesize':'15', ...}
        res_music = requests.get(url_3, headers=headers, params=params)
        js_2 = res_music.json()
        comments = js_2['comment']['commentlist']
        for i in comments:
            comment = i['rootcommentcontent'] + '
——————————————————————————————————
'
            f2.writelines(comment)
    f2.close()
    input('下载成功，按回车键退出！')

4. Generate a word cloud from the collected comments.

from wordcloud import WordCloud
import jieba, numpy
import PIL.Image as Image

def cut(text):
    return " ".join(jieba.cut(text))

with open("句号评论.txt",encoding="utf-8") as file:
    text = cut(file.read())
    mask_pic = numpy.array(Image.open("心.png"))
    wc = WordCloud(font_path="C:/Windows/Fonts/simfang.ttf",
                   collocations=False,
                   max_words=100,
                   min_font_size=10,
                   max_font_size=500,
                   mask=mask_pic).generate(text)
    wc.to_file('云词图.png')

Images illustrate the network panel, parameter changes, and final word‑cloud results.

Summary

The third project extends the previous one by discovering pagination parameters to crawl more comments, storing them, and visualizing the text with a word cloud. It also notes that the approach can be adapted to other data sources such as CSV or Excel, and suggests using Scrapy for larger‑scale crawling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python qq music

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.