Backend Development 6 min read

How to Scrape NetEase Cloud Music Hot Tracks with Python and BeautifulSoup

This tutorial explains how to fetch the names and URLs of hot songs from NetEase Cloud Music using Python, addressing XPath limitations by cleaning the HTML and parsing it with BeautifulSoup, and provides a complete, runnable script with example output.

Python Crawling & Data Mining

May 20, 2022

How to Scrape NetEase Cloud Music Hot Tracks with Python and BeautifulSoup

Introduction

Recently a follower asked how to fetch the names and URLs of hot tracks from NetEase Cloud Music. The original attempt using xpath failed because the response HTML is not well‑formed.

Solution Overview

The article demonstrates a working solution that parses the page with BeautifulSoup instead of xpath. The key is to replace the interfering “<>” characters that confuse the parser.

Implementation

The script uses requests to download the artist page, sets a random user‑agent, and then processes the HTML with BeautifulSoup. It locates the song list container div id="song-list-pre-cache", iterates over each li element, extracts the song name and builds the full song URL.

# coding:utf-8
import requests, re
from lxml import etree
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

class Wangyiyun(object):
    def __init__(self):
        self.base_url = 'https://music.163.com/discover/artist'
        self.headers = {
            'user-agent': UserAgent().random,
            'referer': 'https://music.163.com/',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
        }

    def get_xpath(self, url):
        res = requests.get(url, headers=self.headers)
        html = res.text.replace('<适合才重要>', '适合才重要')
        return BeautifulSoup(html, 'html.parser')

    def singers_parse(self, url, items):
        html = self.get_xpath(url)
        a_lis = html.find('div', attrs={'id': 'song-list-pre-cache'}).find('ul').find_all('li')
        for a in a_lis:
            song_name = a.find('a').get_text()
            print(song_name)
            song_url = 'https://music.163.com' + a.find('a').get('href')
            print(song_url)
        items['所有歌曲：'] = {}
Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})

The script runs successfully and prints each song name together with its full URL, as shown in the screenshot below.

In summary, using BeautifulSoup to parse the page after cleaning the stray “<>” characters reliably extracts hot track names and links. The author also notes that similar results can be achieved with regular expressions or XPath, and a future article will show how to do the same with pyquery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

XPath beautifulsoup netease-music

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.