How CDFuzz’s Targeted Dictionary Boosts Grey‑Box Fuzzing Coverage by 16%
The award‑winning CDFuzz technique introduces a lightweight, targeted dictionary that eliminates extra instrumentation, achieves up to 16.1% higher coverage, discovers dozens of real bugs, and demonstrates that simple optimizations can outperform complex grey‑box fuzzing strategies across diverse file formats.
Paper Award and Background
The joint research platform of Southern University of Science and Technology and Ant Group received the ACM SIGSOFT Outstanding Paper Award at ICSE 2025 for the paper "Tumbling Down the Rabbit Hole: How do Assisting Exploration Strategies Facilitate Grey‑box Fuzzing?". The work systematically reveals the effectiveness limits of auxiliary strategies in grey‑box fuzzing and introduces a customized targeted‑dictionary technique called CDFuzz that simultaneously improves coverage and vulnerability discovery.
CDFuzz Overview
CDFuzz offers three core advantages:
No extra instrumentation: It automatically extracts key constants from the program’s control‑flow graph (CFG), removing the need for additional instrumentation, symbolic execution, or gradient solving.
Custom targeted dictionary: For each seed, it dynamically generates a dictionary that precisely covers constant‑condition constraints, yielding up to a 16.1% efficiency gain.
Zero compile/runtime overhead: Dictionary generation and application are seamlessly integrated into the fuzzing workflow without extra compilation or execution costs.
Key Findings
Large‑scale experiments on nine fuzzing tools and multiple auxiliary strategies across 21 real‑world projects uncovered three major insights:
Over 90% of constraint breakthroughs involve constant‑comparison types, indicating that deep state exploration is often blocked by input == CONSTANT constraints.
Dictionary‑based strategies outperform expectations; the traditional AFLDict sometimes exceeds symbolic execution tools like QSYM in coverage.
Complex strategies hit depth limits: when constraint depth exceeds 20, symbolic execution success drops to 15%, while dictionary approaches remain unaffected.
Technical Mechanism
CDFuzz implements a two‑stage process to create targeted dictionaries:
Static constant extraction: Using LLVM IR, it parses the program’s CFG to collect all constant values appearing in branch conditions (e.g., 0xdeadbeef, "8BIM").
Dynamic path feedback: Based on the execution path of a given input seed, it selects the subset of constants relevant to the current path constraints and builds a focused dictionary.
Experimental Results
In 24‑hour testing sessions, CDFuzz demonstrated significant advantages:
Average coverage increase: 16.1% overall, with a peak improvement of 26.2% on the strip project compared to the best existing strategy (AFL++Dict).
First discovery of 37 real bugs: Including heap overflows and uninitialized memory issues; 9 have been officially confirmed and 7 already patched.
Cross‑format applicability: Stable performance across 10 file formats such as ELF, JPEG, and SQL.
Conclusion and Future Work
CDFuzz replaces heavyweight constraint solvers with a lightweight targeted dictionary, proving that minimal‑overhead auxiliary strategies can dramatically boost fuzzing efficiency. The authors plan to continue exploring lightweight optimizations, further enhance industrial‑grade testing tools, and integrate the approach into Ant Group’s real‑world security infrastructure.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
