Tagged articles
1 articles
Page 1 of 1
Data Party THU
Data Party THU
Oct 16, 2025 · Artificial Intelligence

How Tensor Product Attention Redefines Long‑Context Transformers

The article analyzes the Tensor Product Attention (TPA) method presented at NeurIPS 2025, explaining how it factorizes Q, K, V tensors to drastically reduce KV cache size and attention complexity, and demonstrates superior convergence, lower perplexity, and faster inference on long‑sequence tasks compared with existing attention variants.

KV cacheRoPETensor Product Attention
0 likes · 11 min read
How Tensor Product Attention Redefines Long‑Context Transformers