Automate Douyin Video Scraping with Python, mitmproxy, and Appium

This tutorial shows how to combine mitmproxy packet capture and Appium mobile automation in Python to automatically collect and download Douyin video URLs, covering environment setup, code snippets, and practical steps for a fully automated scraper.

Open Source Linux
Open Source Linux
Open Source Linux
Automate Douyin Video Scraping with Python, mitmproxy, and Appium

Record how to use Python to crawl app data, using the Douyin video app as an example.

Tools: PyCharm, mitmproxy (or its command‑line component mitmdump), Appium, Windows 10.

Approach:

Use mitmproxy to capture the app's network traffic and obtain the desired video URLs.

Use Appium to automate the app (swipe, click, etc.) so that the scraper can run without manual interaction.

Combine the two to achieve a fully automated crawling solution.

mitmproxy / mitmdump packet capture

Ensure mitmproxy is installed, the phone and PC are on the same LAN, and the mitmproxy CA certificate is configured. Because mitmproxy does not support Windows directly, use its command‑line tool mitmdump to connect with a Python script.

Running mitmdump while the Douyin app is open displays all requests. The relevant video URLs have the following prefixes:

http://v1-dy.ixigua.com/; http://v3-dy.ixigua.com/; http://v9-dy.ixigua.com/

These prefixes identify the target video URLs. Use mitmdump -s scripts.py to execute a Python script that saves the videos:

import requests
path = 'D:/video/'
num = 1788

def response(flow):
    global num
    target_urls = ['http://v1-dy.ixigua.com/', 'http://v9-dy.ixigua.com/', 'http://v3-dy.ixigua.com/']
    for url in target_urls:
        if flow.request.url.startswith(url):
            filename = path + str(num) + '.mp4'
            res = requests.get(flow.request.url, stream=True)
            with open(filename, 'ab') as f:
                f.write(res.content)
                f.flush()
                print(filename + '下载完成')
            num += 1

The script is basic but functional; it saves each video to the specified folder.

Appium for mobile automation

Configure the Android SDK and ensure the device is connected via USB with USB debugging enabled. Start Appium (click the "Start Server" button) and set the Desired Capabilities:

{
  "platformName": "Android",
  "deviceName": "Mi_Note_3",
  "appPackage": "com.ss.android.ugc.aweme",
  "appActivity": ".main.MainActivity"
}

These values can be obtained from adb logcat output by searching for the "Displayed" keyword.

After starting a session, Appium launches Douyin on the device and provides a preview window for interaction.

Python script to drive the app

from appium import webdriver
from time import sleep

class Action():
    def __init__(self):
        self.desired_caps = {
            "platformName": "Android",
            "deviceName": "Mi_Note_3",
            "appPackage": "com.ss.android.ugc.aweme",
            "appActivity": ".main.MainActivity"
        }
        self.server = 'http://localhost:4723/wd/hub'
        self.driver = webdriver.Remote(self.server, self.desired_caps)
        self.start_x = 500
        self.start_y = 1500
        self.distance = 1300

    def comments(self):
        sleep(2)
        self.driver.tap([(500, 1200)], 500)

    def scroll(self):
        while True:
            self.driver.swipe(self.start_x, self.start_y, self.start_x, self.start_y - self.distance)
            sleep(2)

    def main(self):
        self.comments()
        self.scroll()

if __name__ == '__main__':
    action = Action()
    action.main()

Running this script opens Douyin, taps the screen to ensure the page is displayed, and then continuously scrolls to load new videos, allowing the mitmproxy script to capture and download them.

The crawling process may occasionally retrieve duplicate videos.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonmitmproxyAppiumMobile AutomationWeb ScrapingDouyin
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.