Tagged articles
3 articles
Page 1 of 1
Data Party THU
Data Party THU
Aug 22, 2025 · Artificial Intelligence

TwigVLM: How Tiny Branches Accelerate Large Vision‑Language Models

TwigVLM introduces a lightweight “twig” module that prunes visual tokens early and enables self‑speculative decoding, achieving up to 154% speedup on long‑text generation while preserving 96% of original LVLM accuracy, as demonstrated on LLaVA‑1.5‑7B and other benchmarks.

LVLMMultimodal AIToken Pruning
0 likes · 14 min read
TwigVLM: How Tiny Branches Accelerate Large Vision‑Language Models
AIWalker
AIWalker
Jul 15, 2025 · Artificial Intelligence

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

This article presents Dynamic Vision Mamba (DyVM), a method that tackles token and block redundancy in Mamba‑based visual models through a novel re‑ordering pruning strategy and dynamic block selection, achieving a 35.2% FLOPs reduction with only a 1.7% accuracy loss while demonstrating strong generalization across tasks and architectures.

Computer VisionDynamic Block SelectionFLOPs Reduction
0 likes · 22 min read
Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%
JD Cloud Developers
JD Cloud Developers
Apr 27, 2025 · Artificial Intelligence

Overcoming the Hourglass Effect in Residual Quantization for Generative Retrieval

This paper investigates the “hourglass” phenomenon in residual‑quantized semantic identifiers for generative search and recommendation, revealing that token concentration in intermediate codebooks causes path sparsity and long‑tail distributions, and proposes heuristic layer removal and adaptive token‑pruning strategies that markedly improve model performance.

Generative RetrievalToken Pruninghourglass phenomenon
0 likes · 13 min read
Overcoming the Hourglass Effect in Residual Quantization for Generative Retrieval