Automate Douyin Video Scraping with Python, mitmproxy, and Appium
This tutorial shows how to combine mitmproxy packet capture and Appium mobile automation in Python to automatically collect and download Douyin video URLs, covering environment setup, code snippets, and practical steps for a fully automated scraper.
Record how to use Python to crawl app data, using the Douyin video app as an example.
Tools: PyCharm, mitmproxy (or its command‑line component mitmdump), Appium, Windows 10.
Approach:
Use mitmproxy to capture the app's network traffic and obtain the desired video URLs.
Use Appium to automate the app (swipe, click, etc.) so that the scraper can run without manual interaction.
Combine the two to achieve a fully automated crawling solution.
mitmproxy / mitmdump packet capture
Ensure mitmproxy is installed, the phone and PC are on the same LAN, and the mitmproxy CA certificate is configured. Because mitmproxy does not support Windows directly, use its command‑line tool mitmdump to connect with a Python script.
Running mitmdump while the Douyin app is open displays all requests. The relevant video URLs have the following prefixes:
http://v1-dy.ixigua.com/; http://v3-dy.ixigua.com/; http://v9-dy.ixigua.com/These prefixes identify the target video URLs. Use mitmdump -s scripts.py to execute a Python script that saves the videos:
import requests
path = 'D:/video/'
num = 1788
def response(flow):
global num
target_urls = ['http://v1-dy.ixigua.com/', 'http://v9-dy.ixigua.com/', 'http://v3-dy.ixigua.com/']
for url in target_urls:
if flow.request.url.startswith(url):
filename = path + str(num) + '.mp4'
res = requests.get(flow.request.url, stream=True)
with open(filename, 'ab') as f:
f.write(res.content)
f.flush()
print(filename + '下载完成')
num += 1The script is basic but functional; it saves each video to the specified folder.
Appium for mobile automation
Configure the Android SDK and ensure the device is connected via USB with USB debugging enabled. Start Appium (click the "Start Server" button) and set the Desired Capabilities:
{
"platformName": "Android",
"deviceName": "Mi_Note_3",
"appPackage": "com.ss.android.ugc.aweme",
"appActivity": ".main.MainActivity"
}These values can be obtained from adb logcat output by searching for the "Displayed" keyword.
After starting a session, Appium launches Douyin on the device and provides a preview window for interaction.
Python script to drive the app
from appium import webdriver
from time import sleep
class Action():
def __init__(self):
self.desired_caps = {
"platformName": "Android",
"deviceName": "Mi_Note_3",
"appPackage": "com.ss.android.ugc.aweme",
"appActivity": ".main.MainActivity"
}
self.server = 'http://localhost:4723/wd/hub'
self.driver = webdriver.Remote(self.server, self.desired_caps)
self.start_x = 500
self.start_y = 1500
self.distance = 1300
def comments(self):
sleep(2)
self.driver.tap([(500, 1200)], 500)
def scroll(self):
while True:
self.driver.swipe(self.start_x, self.start_y, self.start_x, self.start_y - self.distance)
sleep(2)
def main(self):
self.comments()
self.scroll()
if __name__ == '__main__':
action = Action()
action.main()Running this script opens Douyin, taps the screen to ensure the page is displayed, and then continuously scrolls to load new videos, allowing the mitmproxy script to capture and download them.
The crawling process may occasionally retrieve duplicate videos.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
