Python Tutorial: Crawling YY Live Videos via API and Saving Locally
This article demonstrates how to use Python's requests library to call the YY live video API, parse the returned JSON for video URLs, download each video file, and store them locally, while also providing a complete reusable script for repeated batch downloads.
In this tutorial we explore a practical Python web‑scraping example that targets the YY live streaming platform. The goal is to retrieve video information through a publicly available API and download the videos to the local file system.
The API endpoint used is https://api-tinyvideo-web.yy.com/home/tinyvideosv2 . By sending a GET request with a custom User‑Agent header, the server returns a JSON payload containing video metadata, including a resurl field that holds the direct video URL.
First, we simulate the request using the requests library:
<code>url = 'https://api-tinyvideo-web.yy.com/home/tinyvideosv2'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
data = response.json()
</code>The variable json_data now holds the JSON structure returned by the API. We iterate over the data_list (found under data['data']['data'] ) to extract each video's title and URL:
<code>for d in data_list:
video_title = str(d['yyNum']) + '.mp4'
video_url = d['resurl']
video_content = requests.get(url=video_url, headers=headers).content
with open('video\\' + video_title, mode='wb') as f:
f.write(video_content)
print('保存完成:', video_title)
</code>The above code saves each video file using Python's built‑in open function in binary write mode.
Because the API returns different data on each call, we can loop the request multiple times to collect more videos:
<code>url = 'https://api-tinyvideo-web.yy.com/home/tinyvideosv2'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
for _ in range(page + 1):
response = requests.get(url=url, headers=headers)
data = response.json()
data_list = data['data']['data']
print(data_list)
</code>A complete, reusable script that combines the request loop, parsing, and saving logic is provided below:
<code>import requests
def fire(page):
url = 'https://api-tinyvideo-web.yy.com/home/tinyvideosv2'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
for _ in range(page + 1):
response = requests.get(url=url, headers=headers)
data = response.json()
data_list = data['data']['data']
print(data_list)
for d in data_list:
video_title = str(d['yyNum']) + '.mp4'
video_url = d['resurl']
video_content = requests.get(url=video_url, headers=headers).content
with open('video\\' + video_title, mode='wb') as f:
f.write(video_content)
print('保存完成:', video_title)
if __name__ == '__main__':
fire(10)
</code>The article concludes by noting that this approach leverages the API for quick video acquisition, and future posts will cover direct HTML page crawling to fetch videos from different hosts and categories.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.