How to Scrape NetEase Cloud Music Hot Tracks with Python and BeautifulSoup
This tutorial explains how to fetch the names and URLs of hot songs from NetEase Cloud Music using Python, addressing XPath limitations by cleaning the HTML and parsing it with BeautifulSoup, and provides a complete, runnable script with example output.
Introduction
Recently a follower asked how to fetch the names and URLs of hot tracks from NetEase Cloud Music. The original attempt using xpath failed because the response HTML is not well‑formed.
Solution Overview
The article demonstrates a working solution that parses the page with BeautifulSoup instead of xpath. The key is to replace the interfering “<>” characters that confuse the parser.
Implementation
The script uses requests to download the artist page, sets a random user‑agent, and then processes the HTML with BeautifulSoup. It locates the song list container div id="song-list-pre-cache", iterates over each li element, extracts the song name and builds the full song URL.
# coding:utf-8
import requests, re
from lxml import etree
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
class Wangyiyun(object):
def __init__(self):
self.base_url = 'https://music.163.com/discover/artist'
self.headers = {
'user-agent': UserAgent().random,
'referer': 'https://music.163.com/',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
}
def get_xpath(self, url):
res = requests.get(url, headers=self.headers)
html = res.text.replace('<适合才重要>', '适合才重要')
return BeautifulSoup(html, 'html.parser')
def singers_parse(self, url, items):
html = self.get_xpath(url)
a_lis = html.find('div', attrs={'id': 'song-list-pre-cache'}).find('ul').find_all('li')
for a in a_lis:
song_name = a.find('a').get_text()
print(song_name)
song_url = 'https://music.163.com' + a.find('a').get('href')
print(song_url)
items['所有歌曲:'] = {}
Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})The script runs successfully and prints each song name together with its full URL, as shown in the screenshot below.
In summary, using BeautifulSoup to parse the page after cleaning the stray “<>” characters reliably extracts hot track names and links. The author also notes that similar results can be achieved with regular expressions or XPath, and a future article will show how to do the same with pyquery.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
