REFRAG: Using Tiny Models to Compress RAG for Faster, Smarter AI
Meta’s new REFRAG framework lets a lightweight encoder compress retrieved text into semantic tags, enabling large language models to answer queries with far fewer tokens, lower latency, and higher throughput, while preserving core meaning and allowing flexible placement of compressed information within prompts.
