How to Build a Bilibili Video Downloader with Python: From Scraping to GUI
This article walks through building a Python tool that extracts Bilibili video and audio URLs via developer tools, downloads the media files, merges them using moviepy, and wraps the process in a simple PySimpleGUI interface, complete with full source code.
1. Principle Overview
The principle is simple: obtain the video resource's source URL, crawl the binary content, and write it to the local file system.
2. Webpage Analysis
Open the Bilibili video page, press F12 to open Developer Tools, go to the Network tab, select "All", sort by size, and locate the request that likely contains the video source URL.
Example video URL: https://www.bilibili.com/video/BV1BU4y1H7E3
Copy the url part, return to the Elements panel, use Ctrl+F to search for it, and find the JSON node that holds the real video file address.
Parsing the JSON reveals the video and audio URLs.
When opening the extracted URL directly, a 403 error appears because the request lacks proper headers.
3. Video Crawling
In the page source we use a regular expression to extract the window.__playinfo__ JSON.
import requests
import re
import json
url = 'https://www.bilibili.com/video/BV1BU4y1H7E3'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36",
"referer": "https://www.bilibili.com"
}
resp = requests.get(url, headers=headers)
playinfo = re.findall(r'<script>window.__playinfo__=(.*?)</script>', resp.text)[0]
playinfo_data = json.loads(playinfo)The JSON contains video and audio sections; we extract their base_url fields.
# video and audio URLs
video_url = json_data['data']['dash']['video'][0]['base_url']
audio_url = json_data['data']['dash']['audio'][0]['base_url']Multiple base_url entries correspond to different qualities. The example selects the first (4K) entry.
4. Save to Local
With the URLs and a title extracted from the page, we download the files using requests and write them in chunks.
def down_file(file_url, file_type):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36",
"referer": "https://www.bilibili.com"
}
resp = requests.get(url=file_url, headers=headers)
print(resp.status_code)
chunk_size = 1024
file_size = int(resp.headers['content-length'])
file_size_mb = file_size / 1024 / 1024
print(f'File size: {file_size_mb:.2f} MB')
start = time.time()
with open(title + '.' + file_type, 'wb') as f:
done = 0
for chunk in resp.iter_content(chunk_size=chunk_size):
f.write(chunk)
done += len(chunk)
print(f'\rProgress: {done/file_size*100:.2f}%', end='')
end = time.time()
print(f'
Time: {end-start:.2f}s, Speed: {file_size_mb/(end-start):.2f} MB/s')Running the function yields output such as:
# Video download
>>> down_file(video_url, 'mp4')
200
File name: 【咒术回战】第20集五条悟帅的有些过分了
File size: 42.10 MB
Progress: 100.00%
Time: 5.72 s, Speed: 7.36 MB/s
# Audio download
>>> down_file(audio_url, 'mp3')
200
File name: 【咒术回战】第20集五条悟帅的有些过分了
File size: 5.13 MB
Progress: 100.00%
Time: 0.80 s, Speed: 6.42 MB/s5. GUI Tool Creation
The download logic is wrapped in a small GUI built with PySimpleGUI.
import PySimpleGUI as sg
sg.theme('SystemDefaultForReal')
layout = [
[sg.Text('Enter Bilibili video URL:'), sg.InputText(key='url')],
[sg.Button('Start Download'), sg.Button('Exit')]
]
window = sg.Window('Bilibili Video Downloader', layout)
while True:
event, values = window.read()
if event in (sg.WIN_CLOSED, 'Exit'):
break
if event == 'Start Download':
url = values['url']
title, video_url, audio_url = get_file_info(url)
down_file(title, video_url, 'mp4')
down_file(title, audio_url, 'mp3')
merge(title)
window.close()6. Full Code
The complete script combines the functions above, including the merge function that uses moviepy to combine video and audio into a single file.
import requests, re, json, time
from moviepy.editor import *
import PySimpleGUI as sg
def get_file_info(url):
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36", "referer": "https://www.bilibili.com"}
resp = requests.get(url, headers=headers)
playinfo = re.findall(r'<script>window.__playinfo__=(.*?)</script>', resp.text)[0]
data = json.loads(playinfo)
title = re.findall(r'<h1 title="(.*?)" class="video-title">', resp.text)[0]
video_url = data['data']['dash']['video'][0]['base_url']
audio_url = data['data']['dash']['audio'][0]['base_url']
return title, video_url, audio_url
def down_file(title, file_url, file_type):
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36", "referer": "https://www.bilibili.com"}
resp = requests.get(url=file_url, headers=headers)
chunk_size = 1024
file_size = int(resp.headers['content-length'])
file_size_mb = file_size / 1024 / 1024
start = time.time()
with open(title + '.' + file_type, 'wb') as f:
done = 0
for chunk in resp.iter_content(chunk_size=chunk_size):
f.write(chunk)
done += len(chunk)
print(f'\rProgress: {done/file_size*100:.2f}%', end='')
end = time.time()
print(f'
Time: {end-start:.2f}s, Speed: {file_size_mb/(end-start):.2f} MB/s')
def merge(title):
video = VideoFileClip(title + '.mp4')
audio = AudioFileClip(title + '.mp3')
video = video.set_audio(audio)
video.write_videofile(f"{title}(with audio).mp4")
sg.theme('SystemDefaultForReal')
layout = [[sg.Text('Enter Bilibili video URL:'), sg.InputText(key='url')],
[sg.Button('Start Download'), sg.Button('Exit')]]
window = sg.Window('Bilibili Video Downloader', layout)
while True:
event, values = window.read()
if event in (sg.WIN_CLOSED, 'Exit'):
break
if event == 'Start Download':
url = values['url']
title, video_url, audio_url = get_file_info(url)
down_file(title, video_url, 'mp4')
down_file(title, audio_url, 'mp3')
merge(title)
window.close()Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
