How to Scrape NetEase Cloud Music Hot Songs with Python bs4 (Step‑by‑Step)
This tutorial explains how to fetch hot song names and links from NetEase Cloud Music using Python's requests, BeautifulSoup (bs4), and a simple HTML cleanup to overcome malformed tags, providing full code and a working example.
1. Introduction
In a recent Python community, a user asked how to fetch the names and links of hot songs from NetEase Cloud Music. The original attempts using XPath failed because the response HTML is not well‑formed.
2. Implementation
The solution uses bs4 to parse the HTML after replacing the interfering <> characters. The script builds request headers with a random user‑agent, fetches the artist page, extracts each song name and its URL, and prints them.
# coding:utf-8
# @Time : 2022/5/11 11:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(bs4).py
# @Software: PyCharm
import requests, re
from lxml import etree
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
class Wangyiyun(object):
def __init__(self):
self.base_url = 'https://music.163.com/discover/artist'
self.headers = {
'user-agent': UserAgent().random,
'referer': 'https://music.163.com/',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
def get_xpath(self, url):
res = requests.get(url, headers=self.headers)
html = res.text.replace('<适合才重要>', '适合才重要')
return BeautifulSoup(html, 'html.parser')
def singers_parse(self, url, items):
html = self.get_xpath(url)
song_dict = {}
a_lis = html.find('div', attrs={'id': 'song-list-pre-cache'}).find('ul').find_all('li')
for a in a_lis:
song_name = a.find('a').get_text()
print(song_name)
song_url = 'https://music.163.com' + a.find('a').get('href')
print(song_url)
items['所有歌曲:'] = song_dict
Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})The key point is to replace the problematic <> tags that confuse the parser.
3. Result
The script runs successfully and outputs the list of hot songs with their links (see screenshot).
4. Conclusion
The bs4‑based method works reliably; the difficulty lies in removing the interfering tags. Future articles will explore using pyquery to further solidify Python selector fundamentals.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
