Jun 2, 2026 · Artificial Intelligence

RTPrune: Two‑Stage Reading‑Inspired Token Pruning for Efficient DeepSeek‑OCR Inference

The paper presents RTPrune, a token‑pruning technique for DeepSeek‑OCR that exploits a two‑stage reading behavior in LLM decoding, first keeping high‑norm visual tokens and then fusing the rest via optimal‑transport matching with a dynamic pruning‑rate strategy, achieving up to 15% GFLOPs reduction and 18.9% speedup while preserving over 99% OCR accuracy across multiple benchmarks.

DeepSeek-OCROCR efficiencydynamic pruning

0 likes · 9 min read

RTPrune: Two‑Stage Reading‑Inspired Token Pruning for Efficient DeepSeek‑OCR Inference