Industry Insights 18 min read

How Leading Tech Companies Tackle Precise Automated Testing and Visual AI Evaluation

This article compiles the Q&A from iTech Talk 17, where experts from iQIYI, Meituan, Tencent, and ZhaiZhai share practical solutions for precise automated testing, intelligent test instrumentation, APP diff automation, and visual AI evaluation, revealing real‑world challenges, implementation details, and measurable benefits across large‑scale production environments.

iQIYI Technical Product Team

Aug 20, 2021

How Leading Tech Companies Tackle Precise Automated Testing and Visual AI Evaluation

iTech Talk Overview

On August 7, the iTech Talk series (session 17) organized by iQIYI's technology product team in collaboration with TesterHome focused on “Intelligent and Precise Testing Frontiers.” Speakers from iQIYI, Meituan, Tencent, and ZhaiZhai presented their experiences, and the following Q&A captures the core technical insights.

iQIYI Precise Testing Practice

Speaker: Su Hui (iQIYI Test Expert)

Q: How do precise test cases differ from ordinary manual test cases?

A: No difference.

Q: How does bytecode instrumentation use header information to determine whether a statement is hit?

A: It uses the traceID in the header to link the call chain.

Q: Which business scenarios are suitable for precise testing, and what efficiency gains can be expected?

A: It fits full‑regression testing or scenarios with frequent quality issues caused by insufficient regression coverage.

Q: How to apply precise testing to fast‑changing services?

A: Rapid iteration does not affect the use of precise testing.

Q: What manpower is required to maintain the mapping between test cases and code, and how often is it updated?

A: Mapping updates automatically during test execution.

Q: How does the code‑coverage tool ensure testing scope accuracy?

A: It records the mapping between test cases and code.

Q: How to apply precise testing to SDK‑style C/C++ implementations?

A: C/C++ support is pending extension.

Q: How to guarantee correctness of the large initial case‑code mapping?

A: By monitoring abnormal cases.

Q: At what granularity (method or class) is the case‑code association?

A: Branch‑level granularity.

Q: Is the diff‑to‑affected‑case mapping considered regression or new testing?

A: It is regression testing.

Q: How are baseline images generated for result verification?

A: Baseline images are generated automatically for each baseline task.

Q: How are test cases linked to code in the backend?

A: The linkage is recorded automatically during execution, including request IP.

Tencent Intelligent Automation Testing

Speaker: Hu Ji (Senior Test Development Engineer, Tencent)

Q: How does the container‑to‑device agent work?

A: After networking the phone, the container can connect directly.

Q: How is iOS cloud real‑device debugging performed? Is jailbreaking required?

A: It works like Android; after connecting a Mac, no jailbreaking is needed.

Q: Is the custom hardware developed in‑house?

A: Yes.

Q: Does custom hardware risk missing compatibility issues?

A: Custom devices focus on functional testing; compatibility testing still requires real devices.

APP Diff Automation Solution

Speakers: Qi Wenfang & Wei Zhenzhen (Senior Test Engineers, iQIYI)

Q: Does the deep‑link solution require app‑side support or modifications?

A: Yes, the app must expose deep‑link capabilities; page‑parameter parsing may need custom development.

Q: Is the ultimate goal of APP Diff to improve UI compatibility?

A: The goal is to raise native framework stability and achieve near‑human verification accuracy.

Q: How is the baseline image library maintained when UI changes frequently?

A: The baseline package is relatively stable; for large changes a new offline package is used, and the base library is regenerated each run.

Q: Are the four image‑matching rules manually configured?

A: Default rules and thresholds are designed per business scenario, with optional per‑case overrides.

Q: Are baseline images pre‑saved manually?

A: No, they are generated automatically for each baseline task.

Q: How feasible is AI‑generated test case creation?

A: Code generation using Freemarker templates abstracts similar UI paths and deep‑links to produce test scripts.

Q: What components make up the mock service?

A: Data storage, configuration management, user management, and external API services.

ZhaiZhai Precise Testing Implementation

Speaker: Tian Xixi (Test Architecture Engineer, ZhaiZhai)

Q: What is the difference between log‑id and trace‑id?

A: Each request gets a log‑id passed via parameters; trace‑id is injected by the framework for end‑to‑end call‑chain tracing.

Q: What information does the RPC context carry for call‑chain linking?

A: It includes upstream/downstream service info and method‑level call stack stored in the attachment.

Q: Does deduplication of data affect accuracy?

A: Precise instrumentation samples data without impacting the instrumented program.

Q: Does the sandbox‑based coverage tool require custom modules?

A: Yes, custom sandbox modules are needed; both sandbox and Jacoco need development effort.

Meituan Visual AI Capability Unified Evaluation

Speaker: Yi Shunchang (Testing Expert, Meituan)

Q: What metrics are typically used for visual AI evaluation?

A: Metrics depend on the task; for classification, accuracy and recall are common. The core is to identify patterns in good and bad cases.

Q: How is data labeling performed besides manual annotation?

A: Model predictions can be used for pre‑labeling, followed by human verification.

Q: Is the unified management platform part of a micro‑service architecture?

A: No, it is a service‑management layer that improves evaluation efficiency.

Q: How is the coverage of evaluation data measured against requirements?

A: Coverage is judged by how well the evaluation set reflects the target business scenarios.

Q: How are evaluation standards (accuracy/recall thresholds) defined?

A: Standards come from business needs; without clear requirements, they are set by iteration goals.

Q: What is the recommended size of an evaluation image set?

A: Typically over 1,000 images, with a minimum of 200.

Q: Are synthetic images generated for testing?

A: GAN‑generated data is used for security‑related models; other services rely on real‑world data.

Q: Is there an automated approach to bad‑case analysis?

A: No, analysis remains a manual activity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

automated testing precise testing software testing industry insights visual AI

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.