How to Scrape NetEase Cloud Music Hot Tracks with PyQuery in Python
Learn to extract the names and URLs of popular NetEase Cloud Music tracks using Python's pyquery library, with step‑by‑step code, selector tips, and comparisons to regex, xpath, and BeautifulSoup approaches, enabling reliable web‑scraping of music data.
Introduction
A fan asked how to fetch the names and links of hot songs from NetEase Cloud Music. Earlier articles covered regex, xpath, and BeautifulSoup solutions; this article demonstrates the same task using the pyquery library.
Implementation
The following script creates a Wangyiyun class that builds request headers, loads a page with pyquery, selects song links via CSS selectors, and prints each song name and its full URL.
# coding:utf-8
# @Time : 2022/5/10 11:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(pyquery).py
# @Software: PyCharm
import requests, re
from lxml import etree
from fake_useragent import UserAgent
from pyquery import PyQuery as pq
class Wangyiyun(object):
def __init__(self):
self.base_url = 'https://music.163.com/discover/artist'
self.headers = {
'user-agent': UserAgent().random,
'referer': 'https://music.163.com/',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
def get_xpath(self, url):
html = pq(url, headers=self.headers)
doc = pq(html)
return doc
def singers_parse(self, url, items):
doc = self.get_xpath(url)
song_dict = {}
a_lis = doc('#song-list-pre-cache > ul li > a').items()
for a in a_lis:
song_name = a.text()
print(song_name)
song_url = 'https://music.163.com' + a.attr('href')
print(song_url)
items['所有歌曲:'] = song_dict
Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})The script runs correctly and outputs the song titles and their corresponding URLs, as illustrated below.
The difficulty lies in mastering pyquery selectors to accurately capture the desired values.
Conclusion
Using pyquery to scrape NetEase Cloud Music hot tracks is effective; the main challenge is constructing proper selectors. The author also notes that regex, xpath, bs4, and pyquery can all achieve the task, and future articles will explore the html5lib library for further practice.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
