Risk‑Driven Delivery and Quality Assessment Model for Intelligent Testing
Baidu’s risk‑driven delivery system applies a machine‑learning quality‑assessment model that automatically identifies, controls, and decides testing risks across over 50 features, enabling precise test selection, intercepting hundreds of bugs, cutting test waiting time from 50 to 2 hours, and paving the way toward fully automated intelligent testing.
Baidu's practice of intelligent testing emphasizes a risk‑driven delivery approach, which is a key research direction in the perception‑intelligence stage. Three observations motivate this work: (1) more than 80% of projects have no associated bugs or production issues; (2) many testing tasks fail to uncover defects, resulting in a high proportion of ineffective quality actions; (3) testers can make misjudgments, leading to missed tests.
To improve testing efficiency and recall, the goal is to identify the projects that truly need testing and assess risks accurately.
This article is the third in a three‑part series that reveals Baidu's risk‑driven delivery. It focuses on a quality‑assessment model that supports risk‑based decision making.
Background
Traditional testing decisions rely heavily on human judgment, which varies across individuals and can affect quality and efficiency. The manual decision process typically involves three steps: (1) reviewing delivery data and reports (code, impact scope, quality activities); (2) giving a decision conclusion (proceed to the next stage or request additional QA testing); (3) following up on the project to see if bugs surface, conducting case studies, and sharing experience to improve future testing.
Problems with this manual approach include high cost due to data collection across multiple platforms, inconsistent expertise among testers, knowledge loss when personnel leave, and limited testing capacity for many concurrent projects.
Proposed Solution
Introduce machine learning to automate or assist decision making. Similar AI‑driven scenarios exist in autonomous driving, self‑diagnosis systems, and facial security checks, suggesting that risk‑driven automation is feasible for testing.
Overall Solution Architecture
The quality‑assessment system consists of three core components: risk identification, risk control, and risk decision.
Risk Identification: Collect data from five dimensions (over 50 features) linked to test tickets, requirement cards, and pipeline IDs. This creates a feature lineage for each test case and supports custom feature queries.
Risk Control: Based on identified risks, recommend test activities and cases, automatically generate test inputs, and execute targeted testing.
Risk Decision: After risk control, evaluate residual risk probability and potential impact, then provide test suggestions, risk levels, and decision conclusions. Decisions can trigger automated or assisted workflow actions.
Risk identification solves the questions of what data to collect, how to collect it, and how to link it.
Risk decision uses a binary classification model to predict whether a test task carries risk and its probability. Considering data effectiveness, model interpretability, and limited data volume, logistic regression was chosen. The logistic regression formula is shown in the figure below.
The model uses features such as development duration (x1) and change function count (x2). Evaluation metrics include accuracy, precision, and recall.
Decision outcomes are visualized on a risk matrix (probability vs. impact). High‑probability, high‑impact risks are intercepted; low‑probability or low‑impact risks may pass automatically or require QA confirmation.
Risk Visualization Report
The report presents risk data and decision suggestions, and provides a feedback entry for QA. Feedback is fed back into the model training loop: QA feedback → bug analysis → feature extraction → model iteration → next deployment cycle.
Deployment Results
Quality improvements: In Q3 2022, 1,123 non‑automatable projects were identified, intercepting 318 bugs.
Efficiency gains: In Q3 2022, 4,345 automatable projects were identified, saving 2,172 person‑days; test waiting time dropped from 50 hours to 2 hours.
Future Direction
The current stage corresponds to “assisted decision” (similar to L2/L3 in autonomous driving). The roadmap aims to advance to conditional and high automation (L4/L5), ultimately achieving fully automated decision making in the intelligent delivery system.
Future workflow: (1) A demand triggers the quality‑assessment system, which recommends test activities, cases, and responsible personnel; (2) During execution, the system checks test sufficiency and may suggest early termination; (3) After testing, the system reports remaining risks, suggests additional tests or automatic flow‑through based on risk level.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.