Fundamentals 20 min read

18 Hard Problems Every Software Tester Should Solve

This article presents a curated list of eighteen challenging problems in software testing—ranging from measuring test sufficiency and effectiveness to test case reduction, layering, data preparation, automation, concurrency, rollback, compatibility, and formal verification—along with insights and potential research directions to guide practitioners and researchers.

Alibaba Cloud Developer

Aug 9, 2020

18 Hard Problems Every Software Tester Should Solve

For software testing, what does it mean to test enough? How can we evaluate the effectiveness of tests? With so many test cases, how should we prune them? Alibaba researcher Zheng Ziying shares eighteen hard problems he has identified in testing, offering perspectives that may inspire readers.

1. Test Sufficiency

Answering “have we tested enough?” goes beyond code coverage; it requires considering all scenarios, states, state‑transition paths, event sequences, configurations, data variations, etc. Even with exhaustive metrics, absolute certainty is rarely achievable, and we can only approach sufficiency.

2. Test Effectiveness

Effectiveness measures a test suite’s ability to discover bugs. Apart from checking whether tests validate all data persisted by the system, mutation testing is a widely applicable technique. Current challenges include preventing the “pesticide effect” and extending mutation beyond code to configurations and data.

3. Test Case Pruning

Many test cases waste execution time, but identifying which are redundant is difficult. Redundancies arise from duplicated steps, equivalent class coverage, or overlapping test objectives. Systematic pruning requires reliable metrics for sufficiency and effectiveness.

4. Test Layering

Teams struggle with the extent of full‑stack regression. If system boundaries are well defined, it may be possible to validate only the changed component against its contracts, avoiding integration tests. However, practical evidence and a complete methodology are still lacking.

5. Reducing Analysis Omissions

Analysis omissions cause many failures, often as unknown‑unknowns. A systematic approach to uncovering hidden corner cases and converting unknown‑unknowns into known‑unknowns is needed.

6. Automatic Test Case Generation

Techniques such as fuzz testing, model‑based testing, record‑replay, and traffic bifurcation generate tests automatically. While generating test steps is mature, creating reliable test oracles remains a major challenge.

7. Automatic Problem Diagnosis

Automated diagnosis for both online and offline issues suffers from limited generality and heavy reliance on expert‑crafted rules. Techniques like automatic call‑graph comparison can aid in pinpointing root causes.

8. Automated Defect Repair

Industrial solutions such as Alibaba’s Precfix and Facebook’s SapFix exist, but they are still early‑stage with various limitations.

9. Test Data Preparation

Each test case should be independent, yet preparing fresh data for every case is inefficient. A “data bank” that reuses data produced by previous tests and lends it to subsequent tests can reduce preparation overhead and enable smarter test ordering.

10. Exception Testing

Distributed systems encounter numerous exceptions (timeouts, network glitches, resource exhaustion, etc.). Ensuring correct system behavior under all such conditions, and defining expected outcomes for each, is a massive challenge.

11. Concurrency Testing

Concurrency appears at database, process, thread, and business‑logic levels. Traditional performance‑based concurrency testing is flaky; research such as Microsoft’s CHESS and Alibaba’s distributed model checking aim to improve reliability.

12. Rollback Testing

While rollbacks are supported, verifying post‑rollback correctness is difficult. Coverage of all possible rollback points and handling compatibility of data generated by newer code after a rollback are open problems.

13. Compatibility Testing

Ensuring new code works with legacy data and handling upgrades that occur mid‑workflow require exhaustive scenario coverage, which is often impractical.

14. Mocking

Test effectiveness depends on mock fidelity. A “one‑code‑three‑modes” approach—normal, mock, and performance‑mock builds—could keep mocks in sync with production code and reduce maintenance effort.

15. Static Code Analysis

Static analysis can catch issues like forgotten ThreadLocal cleanup or potential NPEs earlier than dynamic testing, and can also identify certain concurrency bugs.

16. Formal Verification

Beyond protocols and algorithms, exploring the value of formal methods for business‑level logic remains an open research direction.

17. Mistake‑Proof Design

While not strictly testing, designing systems to prevent errors can dramatically reduce the need for testing. Summarizing principles and tooling for mistake‑proof design is a worthwhile pursuit.

18. Testability

Testability is often reduced to adding hooks, but a systematic set of design principles, anti‑patterns, and guidelines—similar to classic software design patterns—could improve testability across domains.

Note: The author also mentions additional challenges not listed, such as achieving >99% regression pass rates and enabling code‑change gates for continuous delivery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

software testing test automation test sufficiency test effectiveness testing challenges

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.