18 Hard Problems Every Software Tester Should Solve
This article presents a curated list of eighteen challenging problems in software testing—ranging from measuring test sufficiency and effectiveness to test case reduction, layering, data preparation, automation, concurrency, rollback, compatibility, and formal verification—along with insights and potential research directions to guide practitioners and researchers.
For software testing, what does it mean to test enough? How can we evaluate the effectiveness of tests? With so many test cases, how should we prune them? Alibaba researcher Zheng Ziying shares eighteen hard problems he has identified in testing, offering perspectives that may inspire readers.
1. Test Sufficiency
Answering “have we tested enough?” goes beyond code coverage; it requires considering all scenarios, states, state‑transition paths, event sequences, configurations, data variations, etc. Even with exhaustive metrics, absolute certainty is rarely achievable, and we can only approach sufficiency.
2. Test Effectiveness
Effectiveness measures a test suite’s ability to discover bugs. Apart from checking whether tests validate all data persisted by the system, mutation testing is a widely applicable technique. Current challenges include preventing the “pesticide effect” and extending mutation beyond code to configurations and data.
3. Test Case Pruning
Many test cases waste execution time, but identifying which are redundant is difficult. Redundancies arise from duplicated steps, equivalent class coverage, or overlapping test objectives. Systematic pruning requires reliable metrics for sufficiency and effectiveness.
4. Test Layering
Teams struggle with the extent of full‑stack regression. If system boundaries are well defined, it may be possible to validate only the changed component against its contracts, avoiding integration tests. However, practical evidence and a complete methodology are still lacking.
5. Reducing Analysis Omissions
Analysis omissions cause many failures, often as unknown‑unknowns. A systematic approach to uncovering hidden corner cases and converting unknown‑unknowns into known‑unknowns is needed.
6. Automatic Test Case Generation
Techniques such as fuzz testing, model‑based testing, record‑replay, and traffic bifurcation generate tests automatically. While generating test steps is mature, creating reliable test oracles remains a major challenge.
7. Automatic Problem Diagnosis
Automated diagnosis for both online and offline issues suffers from limited generality and heavy reliance on expert‑crafted rules. Techniques like automatic call‑graph comparison can aid in pinpointing root causes.
8. Automated Defect Repair
Industrial solutions such as Alibaba’s Precfix and Facebook’s SapFix exist, but they are still early‑stage with various limitations.
9. Test Data Preparation
Each test case should be independent, yet preparing fresh data for every case is inefficient. A “data bank” that reuses data produced by previous tests and lends it to subsequent tests can reduce preparation overhead and enable smarter test ordering.
10. Exception Testing
Distributed systems encounter numerous exceptions (timeouts, network glitches, resource exhaustion, etc.). Ensuring correct system behavior under all such conditions, and defining expected outcomes for each, is a massive challenge.
11. Concurrency Testing
Concurrency appears at database, process, thread, and business‑logic levels. Traditional performance‑based concurrency testing is flaky; research such as Microsoft’s CHESS and Alibaba’s distributed model checking aim to improve reliability.
12. Rollback Testing
While rollbacks are supported, verifying post‑rollback correctness is difficult. Coverage of all possible rollback points and handling compatibility of data generated by newer code after a rollback are open problems.
13. Compatibility Testing
Ensuring new code works with legacy data and handling upgrades that occur mid‑workflow require exhaustive scenario coverage, which is often impractical.
14. Mocking
Test effectiveness depends on mock fidelity. A “one‑code‑three‑modes” approach—normal, mock, and performance‑mock builds—could keep mocks in sync with production code and reduce maintenance effort.
15. Static Code Analysis
Static analysis can catch issues like forgotten ThreadLocal cleanup or potential NPEs earlier than dynamic testing, and can also identify certain concurrency bugs.
16. Formal Verification
Beyond protocols and algorithms, exploring the value of formal methods for business‑level logic remains an open research direction.
17. Mistake‑Proof Design
While not strictly testing, designing systems to prevent errors can dramatically reduce the need for testing. Summarizing principles and tooling for mistake‑proof design is a worthwhile pursuit.
18. Testability
Testability is often reduced to adding hooks, but a systematic set of design principles, anti‑patterns, and guidelines—similar to classic software design patterns—could improve testability across domains.
Note: The author also mentions additional challenges not listed, such as achieving >99% regression pass rates and enabling code‑change gates for continuous delivery.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
