How to Scrape NetEase Music Hot Tracks with XPath in Python
Learn how to extract the names and URLs of popular NetEase Music tracks using Python's requests, lxml's XPath, and a simple HTML cleanup technique, with full code examples and tips for handling non‑standard HTML responses.
1. Introduction
Hello, I am PiPi. A follower from the Python Silver Group asked how to fetch the names and links of hot songs on NetEase Music. The source code is visible, but using xpath directly returns nothing because the response is not well‑formed HTML.
2. Implementation
The response contains irregular tags, so a direct xpath query fails. The solution is to replace the interfering placeholder <适合才重要> with plain text before parsing.
# coding:utf-8
# @Time : 2022/5/11 12:46
# @Author: PiPi
# @Public Account: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(xpath).py
# @Software: PyCharm
import requests, re
from lxml import etree
from fake_useragent import UserAgent
class Wangyiyun(object):
def __init__(self):
self.base_url = 'https://music.163.com/discover/artist'
self.headers = {
'user-agent': UserAgent().random,
'referer': 'https://music.163.com/',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
def get_xpath(self, url):
res = requests.get(url, headers=self.headers)
html = res.text.replace('<适合才重要>', '适合才重要')
return etree.HTML(html)
def singers_parse(self, url, items):
html = self.get_xpath(url)
song_dict = {}
a_lis = html.xpath('//div[@id="song-list-pre-cache"]/ul/li/a')
for a in a_lis:
song_name = a.xpath('.//text()')[0]
print(song_name)
song_url = 'https://music.163.com' + a.xpath('./@href')[0]
print(song_url)
items['所有歌曲:'] = song_dict
Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})The code works as expected; after running, the output shows the song names and their corresponding URLs.
The crucial step is to replace the interfering <> characters, which otherwise cause the HTML parser to treat them as tags.
3. Conclusion
This tutorial demonstrates a reliable way to scrape NetEase Music hot tracks using xpath. Future articles will show how to achieve the same goal with bs4 and pyquery, helping readers deepen their Python selector skills.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
