Tagged articles
8 articles
Page 1 of 1
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 25, 2025 · Artificial Intelligence

Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It

The Baidu Baige team discovered that DeepSeek‑V3.2‑Exp’s long‑context performance lagged behind the official report, traced the issue to a subtle RoPE layout mismatch in the open‑source inference demo, collaborated with DeepSeek to fix it, and verified that the model’s speed and accuracy fully recovered across multiple benchmarks.

AI InfrastructureDeepSeekLLM inference
0 likes · 9 min read
Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It
Data Party THU
Data Party THU
Oct 16, 2025 · Artificial Intelligence

How Tensor Product Attention Redefines Long‑Context Transformers

The article analyzes the Tensor Product Attention (TPA) method presented at NeurIPS 2025, explaining how it factorizes Q, K, V tensors to drastically reduce KV cache size and attention complexity, and demonstrates superior convergence, lower perplexity, and faster inference on long‑sequence tasks compared with existing attention variants.

KV cacheRoPETensor Product Attention
0 likes · 11 min read
How Tensor Product Attention Redefines Long‑Context Transformers
Data Party THU
Data Party THU
Jul 29, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive

This article reviews Meta's rotation‑invariant 2‑simplicial attention, explains its trilinear formulation and windowed implementation, analyzes its impact on scaling laws compared with standard dot‑product attention, and presents experimental results showing when the new mechanism offers advantages.

2-simplicial attentionMetaNeural architecture
0 likes · 12 min read
Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive
NewBeeNLP
NewBeeNLP
Oct 16, 2024 · Artificial Intelligence

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

This article reviews recent advances in training and inference for long‑sequence large language models, comparing ALIBI and RoPE position embeddings, exploring RoPE scaling techniques, analyzing attention optimizations, and outlining practical data, evaluation, and system frameworks for scalable LLM deployment.

Flash AttentionLLMRoPE
0 likes · 14 min read
Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Artificial Intelligence

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

This article explains the core mechanisms of Transformer models, details the Rotational Position Embedding (RoPE) and FlashAttention techniques for handling long sequences, introduces the GLM-4-Plus series, and presents an empirical evaluation on the THUCNews dataset showing its superior long-text performance.

FlashAttentionGLM-4-PlusLong Text
0 likes · 13 min read
How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery
DeWu Technology
DeWu Technology
Mar 13, 2024 · Artificial Intelligence

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

The article reviews LLaMA’s Transformer and RoPE architecture, explains why its context windows (4K‑128K tokens) are limited, and evaluates industry‑proven extension techniques—including linear, NTK‑aware, and YaRN interpolation plus LongLoRA sparse attention—while addressing memory and quadratic‑cost challenges and presenting a KubeAI workflow for fine‑tuning and deployment.

AILLaMALongLoRA
0 likes · 17 min read
Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques