JD Retail Technology
Jun 2, 2026 · Artificial Intelligence
RTPrune: Two‑Stage Reading‑Inspired Token Pruning for Efficient DeepSeek‑OCR Inference
The paper presents RTPrune, a token‑pruning technique for DeepSeek‑OCR that exploits a two‑stage reading behavior in LLM decoding, first keeping high‑norm visual tokens and then fusing the rest via optimal‑transport matching with a dynamic pruning‑rate strategy, achieving up to 15% GFLOPs reduction and 18.9% speedup while preserving over 99% OCR accuracy across multiple benchmarks.
DeepSeek-OCROCR efficiencydynamic pruning
0 likes · 9 min read
