How to Scrape NetEase Cloud Music Hot Tracks with Python Regex

This article explains why XPath fails on NetEase Cloud Music pages, demonstrates a regex‑based Python solution with full source code, shows sample output, and outlines future tutorials using XPath, BeautifulSoup, and PyQuery to strengthen Python selector skills.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape NetEase Cloud Music Hot Tracks with Python Regex

Introduction

A user in a Python community asked how to retrieve the names and links of popular songs from NetEase Cloud Music. The page source is accessible, but using xpath does not return results because the response is not regular HTML.

Why XPath Fails and Regex Works

The response contains malformed HTML, so direct xpath queries cannot extract data. A regular‑expression approach can locate the desired <li> elements and capture song URLs and names.

Python Implementation

# coding:utf-8
# @Time : 2022/5/9 23:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(正则表达式).py
# @Software: PyCharm

import requests
from lxml import etree
from fake_useragent import UserAgent
import re

class Wangyiyun(object):
    def __init__(self):
        self.base_url = 'https://music.163.com/discover/artist'
        self.headers = {
            'user-agent': UserAgent().random,
            'referer': 'https://music.163.com/',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

    def get_xpath(self, url):
        res = requests.get(url, headers=self.headers)
        html = res.text.replace('<适合才重要>', '适合才重要')
        return html

    def singers_parse(self, url, items):
        html = self.get_xpath(url)
        pattern = re.compile(r'><li><a href="(?P<song_url>.*?)">(?P<song_name>.*?)</a></li>.*?', re.S)
        items = re.finditer(pattern, html)
        for item in items:
            song_url = item.group("song_url")
            song_name = item.group("song_name")
            print(song_url, song_name)

Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})

The script runs successfully and prints each song's URL and name, as shown in the following screenshot.

Conclusion

The regex‑based method reliably extracts hot song names and links from NetEase Cloud Music. The main challenge lies in constructing the correct regular expression. Upcoming articles will demonstrate similar tasks using xpath, bs4, and pyquery to help readers solidify their Python selector fundamentals.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonRegexWeb ScrapingXPathlxmlnetease-music
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.