Backend Development 5 min read

How to Scrape NetEase Cloud Music Hot Tracks with PyQuery in Python

Learn to extract the names and URLs of popular NetEase Cloud Music tracks using Python's pyquery library, with step‑by‑step code, selector tips, and comparisons to regex, xpath, and BeautifulSoup approaches, enabling reliable web‑scraping of music data.

Python Crawling & Data Mining

May 23, 2022

How to Scrape NetEase Cloud Music Hot Tracks with PyQuery in Python

Introduction

A fan asked how to fetch the names and links of hot songs from NetEase Cloud Music. Earlier articles covered regex, xpath, and BeautifulSoup solutions; this article demonstrates the same task using the pyquery library.

Implementation

The following script creates a Wangyiyun class that builds request headers, loads a page with pyquery, selects song links via CSS selectors, and prints each song name and its full URL.

# coding:utf-8

# @Time : 2022/5/10 11:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(pyquery).py
# @Software: PyCharm

import requests, re
from lxml import etree
from fake_useragent import UserAgent
from pyquery import PyQuery as pq

class Wangyiyun(object):
    def __init__(self):
        self.base_url = 'https://music.163.com/discover/artist'
        self.headers = {
            'user-agent': UserAgent().random,
            'referer': 'https://music.163.com/',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

    def get_xpath(self, url):
        html = pq(url, headers=self.headers)
        doc = pq(html)
        return doc

    def singers_parse(self, url, items):
        doc = self.get_xpath(url)
        song_dict = {}
        a_lis = doc('#song-list-pre-cache > ul li > a').items()
        for a in a_lis:
            song_name = a.text()
            print(song_name)
            song_url = 'https://music.163.com' + a.attr('href')
            print(song_url)
        items['所有歌曲：'] = song_dict

Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})

The script runs correctly and outputs the song titles and their corresponding URLs, as illustrated below.

The difficulty lies in mastering pyquery selectors to accurately capture the desired values.

Conclusion

Using pyquery to scrape NetEase Cloud Music hot tracks is effective; the main challenge is constructing proper selectors. The author also notes that regex, xpath, bs4, and pyquery can all achieve the task, and future articles will explore the html5lib library for further practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

XPath netease-music pyquery

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.