How COMI Achieves 25‑Point Performance Gains at 32× Compression Using Marginal Information Gain (ICLR 2026)
The COMI framework introduces a marginal information gain metric and a coarse‑to‑fine adaptive compression strategy that preserves relevance and diversity, enabling 32× text compression while boosting downstream QA performance by up to 25 points and doubling inference speed.
Problem
When compressing long contexts (e.g., 32 K tokens) to a short budget (1 K tokens), existing methods that select tokens solely by relevance produce many highly similar tokens. The redundancy creates “information crowding”, causing a sharp drop in downstream performance such as QA accuracy.
Marginal Information Gain (MIG)
MIG quantifies the value of a token (or information unit) as:
MIG = relevance to the query – max similarity to already selected tokens
The relevance term measures how well the token answers the query; the similarity term penalizes redundancy with previously chosen tokens. High‑MIG tokens are both relevant and diverse.
COMI Framework
COMI applies a coarse‑to‑fine adaptive compression in two stages.
Coarse‑grained segment reallocation – The document is split into equal‑length segments. For each segment the average MIG is computed; segments with high information density receive a looser compression rate, while low‑information or highly redundant segments are compressed more aggressively. This allocates the limited token budget to “high‑value” regions.
Fine‑grained token fusion – Within each segment tokens are weighted by their token‑level MIG and fused (e.g., via weighted averaging). Tokens with high MIG dominate the fused representation, whereas low‑MIG tokens are naturally diluted, avoiding the loss of critical details that occurs with uniform pooling.
Experimental Evaluation
COMI was evaluated on five long‑context benchmarks (NaturalQuestions, HotpotQA, NarrativeQA, etc.) using a single training pass. Reported results include:
With Qwen2‑7B as the base model, COMI achieves an Exact Match (EM) of 49.15 on NaturalQuestions, roughly 25 points higher than the best baseline.
On NarrativeQA (32 K‑token inputs) a 32× compression preserves the reasoning chain and yields robust performance.
Using Qwen3‑4B (native 256 K context), COMI’s 32× compressed representation reaches an F1 of 28.89 , compared with 16.90 when feeding the full context.
Inference speed improves by more than 2× . Compression adds 2.76 s overhead and generation takes 0.50 s on NarrativeQA, demonstrating production‑ready efficiency.
Conclusion
By optimizing both relevance and diversity through MIG and by allocating the token budget adaptively, COMI breaks the performance bottleneck of extreme context compression. The coarse‑to‑fine strategy aligns global budget allocation with information density while preserving local semantic detail, enabling lightweight inference for large language models.
Paper: https://arxiv.org/abs/2602.01719
Code: https://github.com/Twilightaaa/COMI
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
