Python Script for Parsing and Downloading Tencent Video via TS Files
This article presents a Python solution that parses Tencent video URLs, retrieves encrypted TS video segments through a third‑party VIP service, downloads them concurrently, and merges the segments into a playable MP4 file, detailing the required environment, workflow, and complete source code.
Runtime Environment
IDE: PyCharm Python version: 3.6 Operating System: Windows
Goal and Approach
Purpose: Parse a Tencent video target URL and download the video, bypassing third‑party VIP services that only provide streaming.
Idea: Obtain the video URL, use a third‑party VIP parsing site to get the TS file list, simulate a browser request to fetch the cached TS files, download them, and finally merge them into an MP4 file for normal playback.
Complete Code
import re</code><code>import os,shutil</code><code>import requests,threading</code><code>from urllib.request import urlretrieve</code><code>from pyquery import PyQuery as pq</code><code>from multiprocessing import Pool</code><code>'''</code><code>'''</code><code>class video_down():</code><code> def __init__(self,url):</code><code> # 拼接全民解析url</code><code> self.api='https://jx.618g.com'</code><code> self.get_url = 'https://jx.618g.com/?url=' + url</code><code> #设置UA模拟浏览器访问</code><code> self.head = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}</code><code> #设置多线程数量</code><code> self.thread_num=32</code><code> #当前已经下载的文件数目</code><code> self.i = 0</code><code> # 调用网页获取</code><code> html = self.get_page(self.get_url)</code><code> if html:</code><code> # 解析网页</code><code> self.parse_page(html)</code><code> def get_page(self,get_url):</code><code> try:</code><code> print('正在请求目标网页....',get_url)</code><code> response=requests.get(get_url,headers=self.head)</code><code> if response.status_code==200:</code><code> print('请求目标网页完成....
准备解析....')</code><code> self.head['referer'] = get_url</code><code> return response.text</code><code> except Exception:</code><code> print('请求目标网页失败,请检查错误重试')</code><code> return None</code><code> def parse_page(self,html):</code><code> print('目标信息正在解析........')</code><code> doc=pq(html)</code><code> self.title=doc('head title').text()</code><code> print(self.title)</code><code> url = doc('#player').attr('src')[14:]</code><code> html=self.get_m3u8_1(url).strip()</code><code> self.url = url[:-10] +html</code><code> print(self.url)</code><code> print('解析完成,获取缓存ts文件.........')</code><code> self.get_m3u8_2(self.url)</code><code> def get_m3u8_1(self,url):</code><code> try:</code><code> response=requests.get(url,headers=self.head)</code><code> html=response.text</code><code> print('获取ts文件成功,准备提取信息')</code><code> return html[-20:]</code><code> except Exception:</code><code> print('缓存文件请求错误1,请检查错误')</code><code> def get_m3u8_2(self,url):</code><code> try:</code><code> response=requests.get(url,headers=self.head)</code><code> html=response.text</code><code> print('获取ts文件成功,准备提取信息')</code><code> self.parse_ts_2(html)</code><code> except Exception:</code><code> print('缓存文件请求错误2,请检查错误')</code><code> def parse_ts_2(self,html):</code><code> pattern=re.compile('.*?(.*?).ts')</code><code> self.ts_lists=re.findall(pattern,html)</code><code> print('信息提取完成......
准备下载...')</code><code> self.pool()</code><code> def pool(self):</code><code> print('经计算需要下载%d个文件' % len(self.ts_lists))</code><code> self.ts_url = self.url[:-10]</code><code> if self.title not in os.listdir():</code><code> os.makedirs(self.title)</code><code> print('正在下载...所需时间较长,请耐心等待..')</code><code> pool=Pool(16)</code><code> pool.map(self.save_ts,[ts_list for ts_list in self.ts_lists])</code><code> pool.close()</code><code> pool.join()</code><code> print('下载完成')</code><code> self.ts_to_mp4()</code><code> def ts_to_mp4(self):</code><code> print('ts文件正在进行转录mp4......')</code><code> str='copy /b '+self.title+'\*.ts '+self.title+'.mp4'</code><code> os.system(str)</code><code> filename=self.title+'.mp4'</code><code> if os.path.isfile(filename):</code><code> print('转换完成,祝你观影愉快')</code><code> shutil.rmtree(self.title)</code><code> def save_ts(self,ts_list):</code><code> try:</code><code> ts_urls = self.ts_url + '{}.ts'.format(ts_list)</code><code> self.i += 1</code><code> print('当前进度%d/%d'%(self.i,len(self.ts_lists)))</code><code> urlretrieve(url=ts_urls, filename=self.title + '/{}.ts'.format(ts_list))</code><code> except Exception:</code><code> print('保存文件出现错误')</code><code>if __name__ == '__main__':</code><code> #电影目标url:狄仁杰之四大天王</code><code> url='https://v.qq.com/x/cover/r6ri9qkcu66dna8.html'</code><code> #电影碟中谍5:神秘国度</code><code> url1='https://v.qq.com/x/cover/5c58griiqftvq00.html'</code><code> #电视剧斗破苍穹</code><code> url2='https://v.qq.com/x/cover/lcpwn26degwm7t3/z0027injhcq.html'</code><code> url3='https://v.qq.com/x/cover/33bfp8mmgakf0gi.html'</code><code> video_down(url2)Video Cache TS Files
The downloaded TS files each contain a few seconds of video; after all segments are downloaded they are merged into a single MP4 file for normal playback, defaulting to high‑definition download.
Implementation Effect
Images (omitted) illustrate the successful download and playback of the merged video.
Disclaimer: This article is compiled from online sources; copyright belongs to the original author. Contact us for removal or licensing if needed.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.