Backend Development 6 min read

How to Scrape NetEase Music Hot Tracks with XPath in Python

Learn how to extract the names and URLs of popular NetEase Music tracks using Python's requests, lxml's XPath, and a simple HTML cleanup technique, with full code examples and tips for handling non‑standard HTML responses.

Python Crawling & Data Mining

May 20, 2022

How to Scrape NetEase Music Hot Tracks with XPath in Python

1. Introduction

Hello, I am PiPi. A follower from the Python Silver Group asked how to fetch the names and links of hot songs on NetEase Music. The source code is visible, but using xpath directly returns nothing because the response is not well‑formed HTML.

2. Implementation

The response contains irregular tags, so a direct xpath query fails. The solution is to replace the interfering placeholder <适合才重要> with plain text before parsing.

# coding:utf-8
# @Time : 2022/5/11 12:46
# @Author: PiPi
# @Public Account: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(xpath).py
# @Software: PyCharm

import requests, re
from lxml import etree
from fake_useragent import UserAgent

class Wangyiyun(object):
    def __init__(self):
        self.base_url = 'https://music.163.com/discover/artist'
        self.headers = {
            'user-agent': UserAgent().random,
            'referer': 'https://music.163.com/',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

    def get_xpath(self, url):
        res = requests.get(url, headers=self.headers)
        html = res.text.replace('<适合才重要>', '适合才重要')
        return etree.HTML(html)

    def singers_parse(self, url, items):
        html = self.get_xpath(url)
        song_dict = {}
        a_lis = html.xpath('//div[@id="song-list-pre-cache"]/ul/li/a')
        for a in a_lis:
            song_name = a.xpath('.//text()')[0]
            print(song_name)
            song_url = 'https://music.163.com' + a.xpath('./@href')[0]
            print(song_url)
        items['所有歌曲：'] = song_dict

Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})

The code works as expected; after running, the output shows the song names and their corresponding URLs.

The crucial step is to replace the interfering <> characters, which otherwise cause the HTML parser to treat them as tags.

3. Conclusion

This tutorial demonstrates a reliable way to scrape NetEase Music hot tracks using xpath. Future articles will show how to achieve the same goal with bs4 and pyquery, helping readers deepen their Python selector skills.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

requests XPath lxml netease-music

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.