Scrape NetEase Cloud Music Hot Tracks with PyQuery: A Step‑by‑Step Guide

This article walks through using Python's pyquery library to fetch the names and URLs of popular songs from NetEase Cloud Music, compares it with previous xpath and BeautifulSoup approaches, provides complete runnable code, and highlights key challenges in selector usage.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Scrape NetEase Cloud Music Hot Tracks with PyQuery: A Step‑by‑Step Guide

1. Introduction

A user in a Python community asked how to retrieve the names and links of hot songs on NetEase Cloud Music. Earlier articles covered the same task using regular expressions, xpath, and BeautifulSoup. This guide demonstrates how to achieve it with the pyquery library.

2. Implementation

The following pyquery‑based script sends a request to the artist page, parses the song list, extracts each song's name and URL, and prints them. It uses a fake user‑agent header to mimic a browser.

# coding:utf-8

# @Time : 2022/5/10 11:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(pyquery).py
# @Software: PyCharm

import requests, re
from lxml import etree
from fake_useragent import UserAgent
from pyquery import PyQuery as pq

class Wangyiyun(object):
    def __init__(self):
        self.base_url = 'https://music.163.com/discover/artist'
        self.headers = {
            'user-agent': UserAgent().random,
            'referer': 'https://music.163.com/',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

    def get_xpath(self, url):
        html = pq(url, headers=self.headers)
        doc = pq(html)
        return doc

    def singers_parse(self, url, items):
        doc = self.get_xpath(url)
        song_dict = {}
        a_lis = doc('#song-list-pre-cache > ul li > a').items()
        for a in a_lis:
            song_name = a.text()
            print(song_name)
            song_url = 'https://music.163.com' + a.attr('href')
            print(song_url)
        items['所有歌曲:'] = song_dict

Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})

Running the script prints each song name and its full URL, confirming that the pyquery selector correctly extracts the required data.

3. Conclusion

The pyquery method works effectively for scraping NetEase Cloud Music hot tracks, with the main difficulty being the correct construction of pyquery selectors. The article also notes that similar results can be achieved with regular expressions, xpath, and BeautifulSoup, and hints at a future tutorial using html5lib.

Pythonweb-scrapingXPathnetease-musicpyquery
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.