How COMI Achieves 32× Compression and Boosts Performance by 25 Points
The COMI framework introduces a marginal information gain metric and a coarse‑to‑fine two‑stage compression strategy that preserves relevance and diversity, enabling 32× context reduction while improving Exact Match on NaturalQuestions by nearly 25 points and more than doubling inference speed.
Problem
When compressing long texts (e.g., 32 K tokens to 1 K), many existing methods keep highly similar tokens. The resulting redundancy creates “information internal competition”, confusing the model and causing a sharp performance drop.
Marginal Information Gain (MIG)
MIG quantifies the value of a token as the relevance of the token to the query minus the maximum similarity of that token to any token already selected:
MIG = relevance to query – max similarity to other units
The metric rewards tokens that are both relevant and novel, while penalizing tokens that duplicate information already chosen.
Coarse‑to‑fine Adaptive Compression (COMI)
Stage 1 – Coarse‑grained group reallocation
The document is split into equal‑length segments. Instead of applying a uniform compression rate, COMI computes a segment‑level MIG and dynamically adjusts the compression budget per segment. Segments with high information density and low redundancy receive a looser compression rate, whereas sparse or highly repetitive segments are compressed more aggressively. This ensures that the limited budget is allocated to high‑value regions.
Stage 2 – Fine‑grained token fusion
Within each segment, tokens are weighted by their token‑level MIG. High‑MIG tokens dominate the weighted fusion, while low‑MIG (redundant) tokens are naturally diluted. This avoids the “information dilution” problem of simple averaging and preserves diverse, critical details.
Empirical Results
Downstream performance
Under a 32× compression ratio, COMI with Qwen2‑7B achieves an Exact Match of 49.15 on NaturalQuestions, nearly 25 points higher than the next best baseline. On NarrativeQA (32 K‑token inputs), COMI retains key reasoning nodes, demonstrating robustness in extreme compression scenarios.
For a 256 K‑context model (Qwen3‑4B), COMI after 32× compression reaches an F1 of 28.89 on NaturalQuestions, far above the 16.90 obtained when feeding the full context.
Efficiency
Inference speed more than doubles under 32× compression. The compression step adds only lightweight overhead (e.g., 2.76 s compression and 0.50 s generation on NarrativeQA), making the approach suitable for industrial deployment.
Conclusion
By upgrading the compression objective from “retain relevant fragments” to “retain relevant and diverse information,” the MIG metric and the coarse‑to‑fine strategy overcome the performance bottleneck of high‑compression scenarios, delivering compact representations that remain rich in information for large‑model inference.
Paper: https://arxiv.org/abs/2602.01719
Code: https://github.com/Twilightaaa/COMI
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
