iQIYI iOS Cloud Recording and Playback Platform: Architecture and Implementation
iQIYI’s iOS Cloud Recording and Playback Platform leverages a cloud‑device farm and a Swift‑based driver to capture user actions, generate Python scripts, and replay them across shared devices, dramatically cutting automation costs while providing fast DOM access, multi‑method element identification, CI integration, and visual reporting for reliable regression testing.
It is well known that mobile apps have short life cycles and fast iteration speeds, which require testers to perform extensive regression on historical features while ensuring new functionality works correctly. Automated regression has therefore become a crucial technique. iOS automation faces three major obstacles: high deployment cost (requiring macOS devices that are expensive and hard to share across teams), steep learning curve (necessitating Xcode and Objective‑C knowledge in addition to Python/Node), and difficulty in extending device drivers such as Appium WDA or Facebook‑WDA, which are written in Objective‑C and have long response times.
To address these issues, iQIYI built an iOS Cloud Recording and Playback Platform on top of a cloud‑device farm. By optimizing device drivers, sharing devices, providing remote rental, script management, task scheduling, and visual reporting, the platform dramatically reduces the cost of automation for business lines, allowing testers to focus on test case design. The platform has been integrated with multiple business lines (video playback, video editing, feed streams) and now serves as an essential quality‑assurance component.
Basic Workflow
During recording, a cloud IDE captures mouse clicks and swipe events, extracts the mobile page DOM tree, and maps user actions to the most appropriate UI element. If OCR or AI recognition is selected, the system extracts text or known AI elements from screenshots, converts them into automation scripts, and stores them.
During playback, the system retrieves the script set, parses it, locates elements via the specified method, and executes actions such as click or swipe. After execution, a test report containing steps, logs, and screenshots is generated.
Recording Module
The recording UI is a web‑based IDE that integrates four main functions: device selection, script management, real‑time screen view, and live script generation. It supports online editing, debugging, and persistent storage on the server.
Device list – shows available phones and allows real‑time switching.
Script management – archives scripts by business line, test suite, etc.
Script editing – automatically generates Python scripts from UI interactions, with online editing and multi‑device debugging.
Phone screen – displays live phone screen and synchronously captures user actions.
Smooth Interaction Experience
Existing WDA solutions (Appium WDA, Facebook WDA) prioritize execution stability, resulting in long response times unsuitable for recording. iQIYI therefore rewrote the driver in Swift, focusing on faster DOM tree retrieval, reduced click latency, and higher frame rates (over 20 fps). Optimizations include hierarchical DOM pruning, removal of complex synchronous waits, and an improved image compression and transmission pipeline.
Performance before and after optimization shows significant reductions in DOM retrieval time, click response time, and frame rendering latency.
Rich Element Identification Methods
To ensure script stability across UI changes, iQIYI provides multiple locating strategies:
Native methods – predicate, accessibility ID, coordinates, leveraging element type, name, label.
XPath – optimized to return results within 1 second, improving multi‑device compatibility.
Image recognition – AI‑driven OCR and icon detection, enabling a single script to run on both Android and iOS.
Playback Module
The playback service handles device selection, task triggering, execution, and problem tracing. Devices can be filtered by model, resolution, OS version, carrier, etc. Tasks can be triggered via CI/CD pipelines, scheduled, or manually, supporting parallel or sequential execution across multiple devices.
During execution, the platform monitors system and app pop‑ups (handling >90 % of them), captures key process information, performs crash detection, and cleans up after the task (uninstalling apps, clearing data). Failed tasks can be retried on similar devices.
After execution, a visual report provides task overview, device info, step‑by‑step actions, logs, and screenshots, enabling rapid root‑cause analysis.
Results and Future Outlook
The system has dramatically lowered iOS automation costs, executing over 100 runs per day across multiple iQIYI business lines, uncovering numerous functional bugs and crashes. Element recognition methods achieve >98 % success rates, and script maintenance typically takes less than 30 minutes per iteration.
Future work includes intelligent test‑case generation, expanded element‑recognition techniques, knowledge‑base‑driven failure analysis, and tighter integration with other iQIYI platforms (traffic recording, PingBack, data mocking) to further enhance automated testing stability.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.