iOS Multi-Device Remote Control Technique Using minicap and WebDriverAgent
This article presents a technical guide for implementing real‑time screen capture and control of iOS devices via a browser, detailing tool introductions, a preliminary solution, an optimized approach, and a comparative analysis of both schemes.
Introduction
Remote control of real devices via a browser consists of two key functions: real‑time screen capture and real‑time transmission of control commands. The platform already supports Android remote control and now explores the optimal solution for iOS, with the implementation written in Python.
Tool Introduction
minicap is the high‑speed screenshot component from the open‑source STF project. The iOS version (iOS‑minicap) provides a socket interface that captures screen data using AVFoundation and iOS screen‑mirroring.
GitHub: https://github.com/openstf/ios-minicap
WebDriverAgent (WDA) is Facebook’s mobile testing framework that runs a WebDriver server on iOS, allowing operations such as launching or terminating apps, clicking, scrolling, and verifying UI states.
GitHub: https://github.com/facebook/WebDriverAgent
Preliminary Scheme
The initial approach combines iOS‑minicap with the native WDA version, requiring iOS 8 or later. Steps include downloading the iOS‑minicap source, obtaining the wdapython package, connecting a device, starting minicap, establishing a socket to receive image headers, and forwarding images via WebSocket to a web front‑end.
Key header fields (byte offset, length, type, meaning) are listed in the table below:
Byte
Length
Type
Meaning
0
1
unsigned char
Version (currently 1)
1
1
unsigned char
Size of the header (from byte 0)
2‑5
4
uint32 (little endian)
Pid of the process
6‑9
4
uint32 (little endian)
Real display width in pixels
10‑13
4
uint32 (little endian)
Real display height in pixels
14‑17
4
uint32 (little endian)
Virtual display width in pixels
18‑21
4
uint32 (little endian)
Virtual display height in pixels
22
1
unsigned char
Display orientation
23
1
unsigned char
Quirk bitflags
After receiving image data, the server pushes it through a WebSocket connection to the browser for display. WDA is then used to map mouse actions from the browser to the device via port forwarding ( iproxy 8100 8100 ) and launching WDA.
Optimization Scheme
To improve responsiveness, the synchronous event‑driven mechanism in native WDA was removed. A new socket thread (port 8888) was added to deliver screenshots independently of WDA, allowing multiple devices per PC and eliminating interference between minicap and WDA processes.
Image packets now consist of a 4‑byte Data_length followed by the raw picture data, simplifying parsing.
Scheme Comparison
Advantages of the optimized scheme: supports multiple devices per PC, independent screenshot and control streams, and avoids minicap‑induced WDA termination.
Disadvantages: WDA‑based screenshot throughput is about 30 % slower than minicap, resulting in lower frame rates.
References
iOS multi‑device remote control technique: http://www.sohu.com/a/240584209_744135
WebDriverAgent introduction: https://testerhome.com/topics/4904
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.