Backend Development 7 min read

Scraping QQ Music Hot Comments with Selenium and Visualizing with Word Cloud in Python

This tutorial walks through using Python's Selenium to automate scrolling and extract QQ Music hot comment data—including user avatars, names, timestamps, and content—then saves the information to CSV and creates a Chinese word cloud for visual analysis.

Python Programming Learning Circle

Aug 4, 2021

Scraping QQ Music Hot Comments with Selenium and Visualizing with Word Cloud in Python

This tutorial demonstrates how to use Python Selenium to collect hot comments from a QQ Music song page, handle infinite scrolling, extract user avatar URLs, nicknames, comment times, and comment texts, and store the data in a CSV file.

1. Initial Test – Verify the Selenium environment by opening the target URL, maximizing the window, and pausing briefly.

from selenium import webdriver
import time
url = 'https://y.qq.com/n/ryqq/songDetail/0006wgUu1hHP0N'

driver = webdriver.Chrome()

driver.get(url)

time.sleep(1)

driver.maximize_window()

2. Page Analysis – The comment section uses a waterfall layout that loads more items as the right-side scrollbar moves; the page URL remains unchanged, so Selenium must control scrolling to load additional comments. Each comment corresponds to an li element.

3. Scroll Wheel Operation – Loop until the desired number of comments is reached, scrolling to the bottom and waiting for data to load.

num = int(input('请输入目标评论数:'))  # target comment count
_single = True
while _single:
    items = driver.find_elements_by_xpath("//li[@class='comment__list_item c_b_normal']")
    print(len(items))
    if len(items) < num:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
    else:
        _single = False

4. Parse Page – Iterate over each li element, extract avatar URL, nickname, time, and comment content, clean newline characters, and store each record in a dictionary appended to a list.

info_list = []
for index, item in enumerate(items):
    dic = {}
    try:
        headPortraits = item.find_element_by_xpath("./div[1]/a/img").get_attribute('src')
        name = item.find_element_by_xpath("./div[1]/h4/a").text
        time = item.find_element_by_xpath("./div[1]/div[1]").text
        content = item.find_element_by_xpath("./div[1]/p/span").text.replace('
', '')
        dic['headPor'] = headPortraits
        dic['name'] = name
        dic['time'] = time
        dic['cont'] = content
        print(index+1)
        print(dic)
        info_list.append(dic)
    except Exception as e:
        print(e)

5. Data Storage – Write the list of dictionaries to a CSV file using Python's csv module.

import csv
head = ['headPor','name','time','cont']
with open('bscxComment.csv', 'w', encoding='utf-8', newline='') as f:
    writer = csv.DictWriter(f, head)
    writer.writeheader()
    writer.writerows(info_list)
    print('写入成功')

6. Run the Program – Execute the script, observe the scrolling and data collection, then open the generated CSV to verify that thousands of comments have been captured.

7. Word Cloud Visualization – Import jieba, PIL, numpy, pandas, and wordcloud, clean the comment text, perform Chinese word segmentation, and generate a word cloud shaped by a mask image.

# Import libraries
import jieba
from PIL import Image
import numpy as np
import pandas as pd
from wordcloud import WordCloud

# Load comment data and clean
text = open("./data.txt", encoding='utf-8').read()
text_cut = jieba.lcut(text)
text_cut = ' '.join(text_cut)

# Prepare mask image
mask_pic = np.array(Image.open("./cat.png"))
word = WordCloud(font_path='C:/Windows/Fonts/simfang.ttf',
                mask=mask_pic,
                background_color='white',
                max_font_size=150,
                max_words=2000,
                stopwords={'的'}).generate(text_cut)
image = word.to_image()
word.to_file('bsx.png')
image.show()

8. Summary – Using Selenium to simulate human browsing can bypass some anti‑scraping measures and efficiently gather large volumes of QQ Music comments; the collected data can then be analyzed and visualized, for example with a word cloud.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python qq music Selenium data-analysis wordcloud web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.