Quality Scoring Model: Intelligent Test Grading and Risk Assessment for Software Delivery
This article introduces a quality scoring model that leverages structured development and testing data to objectively assess project risk, automate test grading, and enable data‑driven decisions for test execution and release, thereby improving delivery efficiency and reducing manual evaluation errors.
In modern software development, teams often face challenges such as deciding whether simple code changes need testing, estimating the risk of released versions, and ensuring the effectiveness of automated tests without human intervention. These issues can lead to either unnecessary testing effort or missed defects.
To address these problems, Baidu's intelligent testing team began researching a quality scoring model at the end of 2019. The model aims to use data from the development process, self‑testing, and automation to make informed decisions about test follow‑up, reduce unnecessary investment, shorten testing cycles, and provide risk estimates that increase confidence in delivery.
After more than a year of research, development, and experimentation, the team built a standardized, scalable data and risk modeling platform that has been piloted across multiple Baidu business lines. The model is presented in a series of articles covering test grading, risk evaluation, large‑scale deployment, algorithm research, feature extraction, and automated risk‑based delivery.
The core idea is to treat project risk assessment as a data‑driven classification problem. By aggregating structured data from requirements, code changes, test coverage, and other dimensions, the model can predict whether a project can be safely released without additional QA involvement. Features include demand size, change frequency, developer experience, code churn, complexity, coverage metrics, and bug convergence curves.
Various modeling techniques were evaluated, including rule‑based models, logistic regression, and scorecard models. Logistic regression was chosen for its interpretability and strong performance (precision > 90%, AUC ≈ 0.94). The scorecard approach further refines feature binning and Weight‑of‑Evidence (WOE) encoding to improve robustness.
Model training used historical, labeled data from production projects, with careful preprocessing to handle missing values and outliers. Multiple classifiers (KNN, Naïve Bayes, SVM, CART) were benchmarked, confirming the superiority of the logistic regression model for this risk‑prediction task.
In practice, the model is integrated into a visual platform that aggregates project information, recommends testing tools, and interacts with a strategy service to retrieve risk scores. The workflow includes data collection from the toolchain, feature engineering, model training, evaluation, and deployment. Continuous feedback loops allow the model to be refined with new samples.
Business impact includes a rise in autonomous testing adoption from 25% to 60% on the commercial platform, a 40+ issue recall improvement during risk‑based release gating, and overall faster, higher‑throughput delivery while maintaining quality.
The study demonstrates that a data‑driven quality scoring model can replace manual test grading, provide objective risk assessments, and enable intelligent, cost‑effective testing decisions across large‑scale software delivery pipelines.
Baidu Intelligent Testing
Welcome to follow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.