How to Download Audio Files with Python Web Scraping: Step-by-Step Guide

This article walks through a Python web‑scraping solution for extracting and downloading audio files from a WeChat page, explains the problem, shows the implementation steps with screenshots, and provides the complete, ready‑to‑run code.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Download Audio Files with Python Web Scraping: Step-by-Step Guide

Hi, I'm PiPi. Recently a member asked about a Python web crawler issue, and I'm sharing the solution here.

1. Introduction

The user wanted to download all files but could only get a single file.

2. Implementation

One contributor provided a solution (see image).

Another contributor also gave a method (see image).

The final approach works in one step (see image).

Below is the complete code:

import requests
import time
from lxml import html
from lxml import etree

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 ',
    'Safari/537.36'}

def get_href_links(url):
    response = requests.get(url, headers=headers)
    page_content = response.content
    dom_tree = html.fromstring(page_content)
    href_links = dom_tree.xpath('//a/@href')
    return href_links

url = "https://mp.weixin.qq.com/s/BANHI5apQzlpeTTdLZAIvg"
urls = set(get_href_links(url)[1:-5])
mp3_d_url = 'https://res.wx.qq.com/voice/getvoice?mediaid={}'
for url in urls:
    response = requests.get(url, headers=headers)
    html = response.text
    selector = etree.HTML(html)
    voice_encode_fileid = selector.xpath('//mpvoice/@voice_encode_fileid')[0]
    name = selector.xpath('//mpvoice/@name')[0]
    d_url = mp3_d_url.format(voice_encode_fileid)
    response = requests.get(d_url)
    if response.status_code == 200:
        with open(name, 'wb') as f:
            f.write(response.content)
        print(f"{name} 下载成功!")
    else:
        print(f"{name} 下载失败!")
    time.sleep(1)  # set request interval to avoid IP blocking

The solution successfully resolved the fan's problem.

3. Conclusion

This article presented a Python web‑crawling issue, provided analysis and code implementation to help the user download all audio files.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Web ScrapingrequestsCrawlerlxmlAudio Download
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.