Tagged articles

MoDA

2 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Apr 19, 2026 · Artificial Intelligence

FlashDepthAttention and Mixed Depth Attention: The Next Phase of Large Model Architecture

The article argues that after a decade of scaling large language models by widening, deepening, and adding data, the real bottleneck now lies in inter‑layer communication, and it presents FlashDepthAttention and MoDA as efficient retrieval‑based mechanisms that replace additive residual connections, improve depth utilization, and boost model performance.

FlashDepthAttentionMoDAResidual Connections

0 likes · 15 min read

FlashDepthAttention and Mixed Depth Attention: The Next Phase of Large Model Architecture

Kuaishou Tech

Jul 10, 2025 · Artificial Intelligence

How MODA’s Modular Duplex Attention Solves Multimodal Attention Imbalance and Boosts Emotion Understanding

The paper introduces MODA, a modular duplex attention multimodal model that addresses severe cross‑modal attention imbalance in existing large multimodal models, proposes a novel attention paradigm and masking scheme, and demonstrates significant performance gains across 21 benchmarks in perception, cognition, and emotion tasks, earning a Spotlight paper at ICML 2025.

Emotion RecognitionMoDAattention mechanisms

0 likes · 13 min read

How MODA’s Modular Duplex Attention Solves Multimodal Attention Imbalance and Boosts Emotion Understanding