Industry Insights 13 min read

Can Recommendation Algorithms Speed Up Test Case Prioritization? A Bilibili Case Study

This article presents a detailed study on applying recommendation‑system techniques to test case prioritization for Bilibili's mobile apps, describing the problem definition, evaluation metrics, data processing, FM model selection, experimental results, practical deployment, and future research directions.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Can Recommendation Algorithms Speed Up Test Case Prioritization? A Bilibili Case Study

Background

With continuous integration and agile development, mobile applications are released frequently, requiring QA teams to execute large numbers of regression tests within tight schedules. Traditional testing consumes significant manpower and time, and unpredictable bug detection delays limit developers' time for fixes. To address this, the team explored Test Case Prioritization (TCP) to rank test cases by risk and accelerate bug discovery.

Related Technology

What is TCP?

TCP, originally defined by Rothermel, seeks an optimal ordering of a test suite so that executing tests in that order yields the greatest benefit, typically earlier bug detection.

How to Measure TCP Effectiveness

The study uses two evaluation functions: the widely adopted APFD (Average Percentage of Fault Detection) and a customized recall_p metric that measures bug recall when only a subset of test cases is executed. Higher APFD or recall values indicate earlier bug detection and fewer missed bugs.

Recommendation Algorithms

Common recommendation techniques (collaborative filtering, content‑based, hybrid) map user and item features to a ranked list. The authors propose treating test cases as items and requirements as user features, enabling recommendation‑based prioritization.

Method

Problem Mapping

Instead of analyzing code features (often limited to line counts), the authors incorporate requirement features inspired by recommendation systems, reducing reliance on costly code analysis.

Data Association

Two approaches are described: (1) Cartesian product of requirements and test cases, which guarantees no missed associations but introduces many irrelevant pairs; (2) Keyword association, where requirement and test case texts are tokenized into keyword phrases and linked only when they share keywords, providing a more precise yet lightweight connection.

Model Selection

Given the sparsity and limited sample size, several models were benchmarked (Logistic Regression, Factorization Machine, DSSM, XGBoost, Random Forest). Experiments identified Factorization Machine (FM) as the best fit for the recommendation task.

Experimental Results

Three research questions were evaluated using 31 Bilibili app versions:

Q1: Does the proposed method outperform random test ordering?

Q2: What is the impact of using requirement data?

Q3: Does the keyword association method improve results?

Results (illustrated in the original figures) show that FM‑based prioritization significantly advances bug detection (higher APFD) and achieves ~90% recall when only 50% of test cases are run. Incorporating requirement data stabilizes performance, while keyword association offers a balanced trade‑off between early detection and recall stability.

Practical Deployment

In production, the pipeline receives the current version's requirement and test case data, scores test cases with the trained FM model, and tags the top p % as “recommended”. QA engineers prioritize these cases during regression testing. The process is automated via Jenkins, which handles data download, model training, prediction, and updates to the TAPD platform with bug‑probability scores and recommendation flags.

Future Work

The current approach exhausts requirement‑based features; future efforts will explore richer code‑level features without adding excessive complexity, aiming to further improve the model’s predictive power.

References

[1] Gregg Rothermel et al., “Test case prioritization: An empirical study”, IEEE International Conference on Software Maintenance, 1999.

[2] Rongqi Pan et al., “Test case selection and prioritization using machine learning: a systematic literature review”, Empirical Software Engineering, 2022.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningSoftware TestingRecommendation SystemsBilibilifactorization machinetest case prioritization
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.