How to Scrape NetEase Cloud Music Hot Songs with Python bs4 (Step‑by‑Step)

This tutorial explains how to fetch hot song names and links from NetEase Cloud Music using Python's requests, BeautifulSoup (bs4), and a simple HTML cleanup to overcome malformed tags, providing full code and a working example.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape NetEase Cloud Music Hot Songs with Python bs4 (Step‑by‑Step)

1. Introduction

In a recent Python community, a user asked how to fetch the names and links of hot songs from NetEase Cloud Music. The original attempts using XPath failed because the response HTML is not well‑formed.

2. Implementation

The solution uses bs4 to parse the HTML after replacing the interfering <> characters. The script builds request headers with a random user‑agent, fetches the artist page, extracts each song name and its URL, and prints them.

# coding:utf-8

# @Time : 2022/5/11 11:46
# @Author: 皮皮
# @公众号: Python共享之家
# @website : http://pdcfighting.com/
# @File : 网易云音乐热门作品名字和链接(bs4).py
# @Software: PyCharm

import requests, re
from lxml import etree
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

class Wangyiyun(object):
    def __init__(self):
        self.base_url = 'https://music.163.com/discover/artist'
        self.headers = {
            'user-agent': UserAgent().random,
            'referer': 'https://music.163.com/',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

    def get_xpath(self, url):
        res = requests.get(url, headers=self.headers)
        html = res.text.replace('<适合才重要>', '适合才重要')
        return BeautifulSoup(html, 'html.parser')

    def singers_parse(self, url, items):
        html = self.get_xpath(url)
        song_dict = {}
        a_lis = html.find('div', attrs={'id': 'song-list-pre-cache'}).find('ul').find_all('li')
        for a in a_lis:
            song_name = a.find('a').get_text()
            print(song_name)
            song_url = 'https://music.163.com' + a.find('a').get('href')
            print(song_url)
        items['所有歌曲:'] = song_dict

Wangyiyun().singers_parse(url='https://music.163.com/artist?id=50653542', items={})

The key point is to replace the problematic <> tags that confuse the parser.

3. Result

The script runs successfully and outputs the list of hot songs with their links (see screenshot).

4. Conclusion

The bs4‑based method works reliably; the difficulty lies in removing the interfering tags. Future articles will explore using pyquery to further solidify Python selector fundamentals.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

tutorialbeautifulsoupweb-scrapingnetease-music
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.