Tagged articles
17 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 24, 2026 · Artificial Intelligence

A Comprehensive Guide to Major Attention Mechanisms: From MHA and GQA to MLA, Sparse and Hybrid Architectures

This article reviews and compares the most important attention variants used in modern large language models—including multi‑head attention, grouped‑query attention, multi‑head latent attention, sparse and sliding‑window attention, gated attention, and hybrid designs—detailing their motivations, memory trade‑offs, example architectures, and experimental findings.

Hybrid ArchitectureLLMMHA
0 likes · 29 min read
A Comprehensive Guide to Major Attention Mechanisms: From MHA and GQA to MLA, Sparse and Hybrid Architectures
AIWalker
AIWalker
Mar 18, 2026 · Artificial Intelligence

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

The AAAI 2026 paper by Tsinghua’s Huang‑Gao team shows that modeling Vision‑Transformer attention as a Block‑Circulant matrix and computing it with FFT reduces the quadratic complexity to O(N log N), delivering up to seven‑fold real‑world speedups without sacrificing accuracy.

AAAI 2026Circulant MatricesComputer Vision
0 likes · 15 min read
7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 14, 2026 · Artificial Intelligence

Can Large Language Models Get Stronger Without Human Language Training? A New Pre‑Pre‑Training Path

A recent study shows that pre‑training Transformers on synthetic, non‑language data generated by Neural Cellular Automata can boost language‑model performance by up to 6%, accelerate convergence by 40%, and improve downstream reasoning, even outperforming models trained on massive natural‑text corpora.

In-Context LearningNeural Cellular AutomataPre‑training
0 likes · 12 min read
Can Large Language Models Get Stronger Without Human Language Training? A New Pre‑Pre‑Training Path
Alimama Tech
Alimama Tech
Nov 11, 2025 · Artificial Intelligence

Accelerating LLM RL with Async Training, Mini‑Critics, and Attention Rewards

This article introduces the 3A collaborative framework—Async architecture, Asymmetric PPO mini‑critics, and an attention‑based reasoning rhythm—demonstrating how decoupled, fine‑grained parallel training and structure‑aware reward allocation dramatically improve efficiency, scalability, and interpretability of reinforcement learning for large language models.

asynchronous trainingattention mechanismslarge language models
0 likes · 23 min read
Accelerating LLM RL with Async Training, Mini‑Critics, and Attention Rewards
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 1, 2025 · Artificial Intelligence

2025 Large Model Engineering Breakthroughs: Cutting Costs, Boosting Performance, and Extending Context

The 2025 open‑source reports reveal major advances in large‑model engineering, including drastic cost cuts such as DeepSeek‑V3 training for $5.57 M, performance gains where Gemma 3 4B matches Gemma 2 27B, memory efficiencies like 85 % KV‑cache reduction, and a suite of new techniques—from loss‑free MoE balancing to multi‑token prediction—that together push context lengths to one million tokens and enable multimodal, aligned, and industry‑specific models.

Cost reductionMultimodal AIattention mechanisms
0 likes · 13 min read
2025 Large Model Engineering Breakthroughs: Cutting Costs, Boosting Performance, and Extending Context
AI Frontier Lectures
AI Frontier Lectures
Jul 31, 2025 · Artificial Intelligence

What’s Driving the Latest LLM Architecture Trends? DeepSeek, OLMo, Gemma, and More Explained

This article examines the evolution of large language model architectures over the past seven years, comparing key design choices such as Multi‑Head Latent Attention, Grouped‑Query Attention, Mixture‑of‑Experts, sliding‑window attention, normalization placement, and optimizer variants across models like DeepSeek V3, OLMo 2, Gemma 3, Llama 4, Qwen 3, SmolLM 3, and Kimi 2.

AI researchLLM comparisonMixture of Experts
0 likes · 30 min read
What’s Driving the Latest LLM Architecture Trends? DeepSeek, OLMo, Gemma, and More Explained
Kuaishou Large Model
Kuaishou Large Model
Jul 11, 2025 · Artificial Intelligence

How MODA’s Modular Duplex Attention Boosts Multimodal Emotion Understanding

The paper introduces MODA, a new multimodal model that tackles attention imbalance across modalities with a modular duplex attention mechanism, achieving significant performance gains on perception, cognition, and emotion tasks across 21 benchmarks and demonstrating strong potential for human‑machine interaction.

Deep LearningMODA modelMultimodal AI
0 likes · 13 min read
How MODA’s Modular Duplex Attention Boosts Multimodal Emotion Understanding
Kuaishou Tech
Kuaishou Tech
Jul 10, 2025 · Artificial Intelligence

How MODA’s Modular Duplex Attention Solves Multimodal Attention Imbalance and Boosts Emotion Understanding

The paper introduces MODA, a modular duplex attention multimodal model that addresses severe cross‑modal attention imbalance in existing large multimodal models, proposes a novel attention paradigm and masking scheme, and demonstrates significant performance gains across 21 benchmarks in perception, cognition, and emotion tasks, earning a Spotlight paper at ICML 2025.

Emotion RecognitionMoDAMultimodal AI
0 likes · 13 min read
How MODA’s Modular Duplex Attention Solves Multimodal Attention Imbalance and Boosts Emotion Understanding
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 6, 2025 · Artificial Intelligence

What AI Programming Agents Reveal About RL, Feedback Loops, and Long‑Context Challenges

In a deep dive into the Cursor team's podcast, core members dissect the current hurdles of AI programming agents, covering feedback‑mechanism design, reinforcement‑learning reward sparsity, tool‑chain integration, long‑context handling, and emerging attention mechanisms that shape the future of code‑centric AI.

AI programmingattention mechanismslong context
0 likes · 35 min read
What AI Programming Agents Reveal About RL, Feedback Loops, and Long‑Context Challenges
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 26, 2024 · Artificial Intelligence

Focused Large Language Models are Stable Many-Shot Learners

FocusICL mitigates the reverse‑scaling of in‑context learning by masking irrelevant tokens and applying hierarchical batch attention, cutting attention complexity, and delivering consistent query focus that yields average accuracy gains of about 5 % across multiple LLMs and benchmarks.

Few‑Shot LearningFocusICLIn-Context Learning
0 likes · 16 min read
Focused Large Language Models are Stable Many-Shot Learners
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 17, 2024 · Artificial Intelligence

How Free-Prompt-Editing Revolutionizes Text-Guided Image Editing with Stable Diffusion

The paper introduces Free-Prompt-Editing, a concise and efficient algorithm that replaces self‑attention maps during denoising to achieve high‑quality text‑guided image edits without source prompts, and demonstrates its superiority over existing methods on both synthetic and real images.

AI researchFree-Prompt-Editingattention mechanisms
0 likes · 6 min read
How Free-Prompt-Editing Revolutionizes Text-Guided Image Editing with Stable Diffusion
Ximalaya Technology Team
Ximalaya Technology Team
Feb 20, 2024 · Artificial Intelligence

Optimization of Deep Learning-Based CTR Models in Advertising

This report presents recent advances in optimizing deep learning click‑through‑rate models for advertising, including improved embedding mechanisms, novel feature‑interaction and architecture designs such as attention‑based behavior sequencing, multi‑tower and Mixture‑of‑Experts networks, dynamic ID handling, hourly updates, incremental training, and outlines future multi‑modal and embedding‑importance research.

CTR modelDeep LearningEmbedding Techniques
0 likes · 13 min read
Optimization of Deep Learning-Based CTR Models in Advertising
DaTaobao Tech
DaTaobao Tech
Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMLLM architecture
0 likes · 32 min read
Large Language Model Upgrade Paths and Architecture Selection
HomeTech
HomeTech
Sep 20, 2022 · Artificial Intelligence

Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition

This article reviews the evolution of deep‑learning image‑classification networks, surveys attention mechanisms for fine‑grained tasks, describes the CVPR 2022 FGVC9 competition solution using RegNetY and random attention cropping, and discusses its deployment in automotive series recognition along with future challenges.

CVPRComputer VisionDeep Learning
0 likes · 19 min read
Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition
TiPaiPai Technical Team
TiPaiPai Technical Team
Jun 11, 2021 · Artificial Intelligence

How Transformers Revolutionize Vision: From DETR to GCNet

This article explores how Transformer architectures, originally designed for NLP, are adapted for visual tasks, detailing pioneering models such as DETR, CBAM, NLNet, SENet, and GCNet, and explains their structures, attention mechanisms, advantages, and experimental findings for image processing.

DETRSelf-Attentionattention mechanisms
0 likes · 13 min read
How Transformers Revolutionize Vision: From DETR to GCNet
iQIYI Technical Product Team
iQIYI Technical Product Team
Dec 28, 2018 · Artificial Intelligence

Short Video Tagging Using Neural Networks

The paper presents a gated‑attention neural network that fuses audio, visual, and title text features to automatically generate high‑quality tags for short videos, achieving state‑of‑the‑art performance on the YouTube‑8M challenge and enabling scalable tagging and recommendation services with future plans for broader tag coverage and temporal segment tagging.

AINeural NetworksYouTube-8M dataset
0 likes · 7 min read
Short Video Tagging Using Neural Networks