Scrape NetEase Cloud Music Hot Tracks with PyQuery: A Step‑by‑Step Guide
This article walks through using Python's pyquery library to fetch the names and URLs of popular songs from NetEase Cloud Music, compares it with previous xpath and BeautifulSoup approaches, provides complete runnable code, and highlights key challenges in selector usage.
1. Introduction
A user in a Python community asked how to retrieve the names and links of hot songs on NetEase Cloud Music. Earlier articles covered the same task using regular expressions, xpath, and BeautifulSoup. This guide demonstrates how to achieve it with the pyquery library.
2. Implementation
The following pyquery‑based script sends a request to the artist page, parses the song list, extracts each song's name and URL, and prints them. It uses a fake user‑agent header to mimic a browser.
# coding:utf-8
# @Time : 2022/5/10 11:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(pyquery).py
# @Software: PyCharm
import requests, re
from lxml import etree
from fake_useragent import UserAgent
from pyquery import PyQuery as pq
class Wangyiyun(object):
def __init__(self):
self.base_url = 'https://music.163.com/discover/artist'
self.headers = {
'user-agent': UserAgent().random,
'referer': 'https://music.163.com/',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
def get_xpath(self, url):
html = pq(url, headers=self.headers)
doc = pq(html)
return doc
def singers_parse(self, url, items):
doc = self.get_xpath(url)
song_dict = {}
a_lis = doc('#song-list-pre-cache > ul li > a').items()
for a in a_lis:
song_name = a.text()
print(song_name)
song_url = 'https://music.163.com' + a.attr('href')
print(song_url)
items['所有歌曲:'] = song_dict
Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})Running the script prints each song name and its full URL, confirming that the pyquery selector correctly extracts the required data.
3. Conclusion
The pyquery method works effectively for scraping NetEase Cloud Music hot tracks, with the main difficulty being the correct construction of pyquery selectors. The article also notes that similar results can be achieved with regular expressions, xpath, and BeautifulSoup, and hints at a future tutorial using html5lib.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
