Tagged articles

Attention Mechanism

73 articles · Page 1 of 1

Jun 7, 2026 · Artificial Intelligence

When Long Prompts Cause Forgetting: Understanding Generalization in In‑Context Continual Learning

The paper introduces a theoretical framework for In‑Context Continual Learning, showing how shared attention in large language models creates bias, variance, and a novel interference term that explains why longer prompts can lead to forgetting, and provides concrete guidelines for prompt design based on task similarity, context length, and order.

Attention MechanismContinual LearningIn-Context Learning

0 likes · 25 min read

When Long Prompts Cause Forgetting: Understanding Generalization in In‑Context Continual Learning

CodePath

Jun 3, 2026 · Artificial Intelligence

A Deliberate Paradigm Shift: How “Attention Is All You Need” Reshaped Deep Learning

The article dissects how the 2017 "Attention Is All You Need" paper sparked a fundamental redesign of sequence modeling by replacing recurrent and convolutional approaches with self‑attention, detailing its mathematical foundations, architectural components, training tricks, limitations, and emerging alternatives such as Mamba.

Attention MechanismMambaMulti-Head Attention

0 likes · 24 min read

A Deliberate Paradigm Shift: How “Attention Is All You Need” Reshaped Deep Learning

Lao Guo's Learning Space

Apr 30, 2026 · Artificial Intelligence

How DeepSeek V4’s CSA + HCA Break the Million‑Token Barrier

Traditional full‑attention cannot handle million‑token contexts due to exponential compute and memory growth, but DeepSeek V4’s Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) compress, sparsely index, and precisely compute tokens, cutting KV cache to 10% and FLOPs to 27% while enabling a 1‑M token window on a single GPU.

Attention MechanismCSAHCA

0 likes · 12 min read

How DeepSeek V4’s CSA + HCA Break the Million‑Token Barrier

Bighead's Algorithm Notes

Apr 27, 2026 · Artificial Intelligence

STEAM: Wavelet‑Enhanced Attention Model for Stock Price Prediction

The STEAM model combines discrete wavelet transform, a wavelet‑enhanced attention mechanism, and a market‑index prefix within a Mamba‑2 encoder to capture multi‑frequency spatial and temporal dependencies in stock data, achieving state‑of‑the‑art performance across multiple international markets as measured by IC, PnL and Sharpe ratios.

Attention MechanismMamba-2deep learning

0 likes · 17 min read

STEAM: Wavelet‑Enhanced Attention Model for Stock Price Prediction

Bighead's Algorithm Notes

Apr 14, 2026 · Artificial Intelligence

How Self‑Supervised HINTS Extracts Human Insights from Time Series to Boost Forecast Accuracy

The paper introduces HINTS, a two‑stage self‑supervised framework that leverages Friedkin‑Johnsen opinion dynamics to mine latent human‑driven factors from time‑series residuals, integrates them via attention into state‑of‑the‑art predictors, and demonstrates consistent accuracy gains and interpretability across nine benchmark and real‑world datasets.

Attention MechanismFriedkin-Johnsen modelbenchmark evaluation

0 likes · 17 min read

How Self‑Supervised HINTS Extracts Human Insights from Time Series to Boost Forecast Accuracy

Machine Heart

Apr 4, 2026 · Artificial Intelligence

Does Scale Stealthily Hijack Attention? PMDformer’s Simple Subtraction Fix for Long-Term Forecasting

The paper identifies scale differences between patches as a hidden source of attention distortion in long‑term time‑series forecasting, introduces PMDformer with Patch Mean Decoupling, Neighbor Variable Attention, and Trend Recovery Attention, and demonstrates state‑of‑the‑art accuracy and efficiency across eight benchmark datasets.

Attention MechanismICLR2026Long-term Time Series Forecasting

0 likes · 8 min read

Does Scale Stealthily Hijack Attention? PMDformer’s Simple Subtraction Fix for Long-Term Forecasting

Data Party THU

Apr 3, 2026 · Artificial Intelligence

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

The article reviews the Kimi team's Attention Residuals approach, which substitutes traditional ResNet additive shortcuts with learned attention‑based weighting, explains the theoretical motivation linking depth to time, details full‑attention and block‑wise implementations, presents experimental results showing up to 1.25× compute efficiency and improved performance on reasoning and knowledge tasks.

Attention MechanismModel EfficiencyResidual Networks

0 likes · 11 min read

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

AI Large-Model Wave and Transformation Guide

Mar 28, 2026 · Artificial Intelligence

What Large‑Model Training Actually Optimizes: Parameters, Attention, and Knowledge Explained

This article breaks down the core of large‑model training by showing that training optimizes neural‑network parameters, that attention is a mechanism realized by those parameters, and that knowledge is encoded implicitly within the weight matrices, providing a clear hierarchy for interview or presentation use.

AI interviewAttention Mechanismdeep learning

0 likes · 6 min read

What Large‑Model Training Actually Optimizes: Parameters, Attention, and Knowledge Explained

Data Party THU

Mar 26, 2026 · Artificial Intelligence

How Mixture-of-Depths Attention Boosts Large Language Model Efficiency

This article examines the Mixture‑of‑Depths Attention (MoDA) mechanism, detailing its novel flash‑compatible KV layout, combined sequence‑depth attention, theoretical analysis, and extensive experiments that show significant reductions in validation loss and accuracy gains on downstream tasks compared to the OLMo2 baseline.

Attention MechanismDeep KVFlashAttention

0 likes · 9 min read

How Mixture-of-Depths Attention Boosts Large Language Model Efficiency

Qborfy AI

Feb 21, 2026 · Artificial Intelligence

How Self-Attention Powers Modern AI: From Theory to Real-World Impact

This article explains the self‑attention mechanism behind transformers, detailing its core components, mathematical formulation, step‑by‑step example, multi‑head extension, industry use cases, and a thorough comparison with RNN and CNN approaches, all supported by concrete numbers and citations.

Attention MechanismSelf-AttentionTransformer

0 likes · 8 min read

How Self-Attention Powers Modern AI: From Theory to Real-World Impact

AI Cyberspace

Feb 13, 2026 · Artificial Intelligence

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

This article traces the evolution of attention mechanisms from their inaugural application in computer vision and machine translation to their central role in modern Transformer models, detailing the underlying RNN‑Attention designs, the breakthrough in sequence alignment, and the innovations that enabled high‑performance, parallelizable deep learning architectures.

Attention MechanismMachine TranslationTransformer

0 likes · 14 min read

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

AI Architecture Hub

Jan 7, 2026 · Artificial Intelligence

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

This article provides a comprehensive, beginner‑friendly walkthrough of the landmark 2017 paper “Attention Is All You Need,” covering its authors, historical context, the shortcomings of RNNs and CNNs, the birth of self‑attention, the Transformer architecture, and its transformative impact on modern AI.

AI historyAttention MechanismTransformer

0 likes · 9 min read

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

AI2ML AI to Machine Learning

Dec 19, 2025 · Artificial Intelligence

The 9 Key Ideas Behind FlashAttention

FlashAttention accelerates transformer inference by combining nine techniques—including loss‑less attention, GPU memory‑pyramid optimization, SRAM‑reusing tiling, safe softmax scaling, online buffering, tile‑size constraints, parallel multiplication, reduced KV slicing, and integrated backward‑pass caching—to achieve efficient, high‑throughput computation on modern GPUs.

Attention MechanismFlashAttentionGPU Optimization

0 likes · 8 min read

HyperAI Super Neural

Dec 12, 2025 · Artificial Intelligence

Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks

This roundup presents five recent AI papers covering hierarchical sparse attention for ultra‑long context, Nvidia's Alpamayo‑R1 VLA model for autonomous driving, the non‑autoregressive F5‑TTS system, LatentMAS for latent‑space multi‑agent collaboration, and Deeper‑GXX that deepens arbitrary graph neural networks, highlighting each method's key innovations and reported performance gains.

Attention MechanismGraph Neural NetworksMulti-Agent Systems

0 likes · 6 min read

Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks

Tencent Cloud Developer

Nov 4, 2025 · Artificial Intelligence

From Functions to Transformers: Mastering Neural Networks Step by Step

This article walks you through the evolution from basic mathematical functions to modern large‑scale models, explaining activation functions, forward and backward propagation, loss calculation, gradient descent, regularization, dropout, word embeddings, RNNs, and the core mechanics of the Transformer architecture.

Attention MechanismRNNRegularization

0 likes · 15 min read

From Functions to Transformers: Mastering Neural Networks Step by Step

Software Engineering 3.0 Era

Sep 28, 2025 · Artificial Intelligence

Why Large Language Models Appear So Smart: The Science of Emergence

The article explains how massive language models achieve seemingly intelligent behavior through emergence at a critical scale, hierarchical planning, attention-driven global coherence, multimodal understanding, and progressive training techniques that turn simple token prediction into sophisticated reasoning and creativity.

Attention MechanismMultimodal AIPrompt Engineering

0 likes · 15 min read

Why Large Language Models Appear So Smart: The Science of Emergence

AIWalker

Sep 17, 2025 · Artificial Intelligence

Cutting-Edge Attention Mechanism Innovations for 2025: Modal Fusion and Domain Adaptation

This article surveys 183 recent attention‑mechanism papers, classifies them into four innovation categories, and highlights representative works such as MILA, ARFFT, CNN‑Transformer for speech emotion, and LSTM‑attention epidemic forecasting, providing concrete methods, code links, and performance insights.

2025Attention MechanismDomain Adaptation

0 likes · 7 min read

Cutting-Edge Attention Mechanism Innovations for 2025: Modal Fusion and Domain Adaptation

Data Party THU

Sep 17, 2025 · Artificial Intelligence

How Matching Networks Tackle Imbalance with Cosine Similarity and Attention

This article provides a comprehensive technical review of Matching Networks, covering cosine similarity mathematics, its transformations, the bias introduced by imbalanced support sets, and a range of mitigation strategies such as adaptive weighting, global distance‑matrix normalization, prior‑based weighting, hierarchical multi‑scale matching, hybrid learning architectures, and attention‑driven dynamic sample selection.

Attention MechanismCosine SimilarityMatching Networks

0 likes · 10 min read

How Matching Networks Tackle Imbalance with Cosine Similarity and Attention

Baobao Algorithm Notes

Jul 18, 2025 · Artificial Intelligence

30+ Expert Q&A on Large Language Model Architecture, Training, and Deployment

This article compiles more than thirty interview‑style questions and detailed answers covering large‑model fundamentals such as encoder‑decoder trade‑offs, self‑attention versus RNN, context length, tokenization, embedding strategies, FlashAttention, RoPE, prompt design, retrieval‑augmented generation, safety measures, fine‑tuning, and model distillation, providing a comprehensive technical reference for practitioners.

Attention Mechanismretrieval-augmented generation

0 likes · 53 min read

30+ Expert Q&A on Large Language Model Architecture, Training, and Deployment

AI Frontier Lectures

Jun 10, 2025 · Artificial Intelligence

Can One Model Master All Remote Sensing Tasks? Introducing the TSSUN Framework

This paper presents the Temporal‑Spectral‑Spatial Unified Network (TSSUN), a flexible deep‑learning architecture that simultaneously handles semantic segmentation, semantic change detection, and binary change detection across heterogeneous remote‑sensing inputs, achieving state‑of‑the‑art performance without task‑specific retraining.

Attention MechanismTSSUNdeep learning

0 likes · 15 min read

Can One Model Master All Remote Sensing Tasks? Introducing the TSSUN Framework

AIWalker

Feb 26, 2025 · Artificial Intelligence

Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap

The paper analytically identifies injectivity and local modeling as the two key factors causing the performance gap between linear and Softmax attention, proposes the InLine attention modifications to restore these properties, and demonstrates through extensive Vision Transformer experiments that the enhanced linear attention matches or surpasses Softmax while retaining linear computational cost.

Attention MechanismEfficient TransformersLinear Attention

0 likes · 24 min read

Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap

Architecture Digest

Feb 24, 2025 · Artificial Intelligence

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

The article introduces MoBA, a Mixture‑of‑Block‑Attention mechanism that applies Mixture‑of‑Experts principles to transformer attention, enabling efficient long‑context processing for large language models while maintaining performance comparable to full attention through sparse, trainable block selection and seamless switching.

Attention MechanismLLMLong Context

0 likes · 12 min read

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

AIWalker

Feb 19, 2025 · Artificial Intelligence

YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection

YOLOv12 introduces an attention‑centric architecture, a lightweight regional attention module, and the R‑ELAN aggregation network, delivering consistent mAP gains and lower latency across N, S, M, L and X model scales while surpassing previous YOLO versions and other real‑time detectors.

Attention MechanismReal-timeYOLOv12

0 likes · 8 min read

YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection

AIWalker

Jan 13, 2025 · Artificial Intelligence

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

ArtCrafter introduces a novel text‑image style transfer framework that leverages attention‑based style extraction, text‑image alignment enhancement, and explicit modulation to achieve controllable, diverse, and high‑fidelity visual results, outperforming existing methods in both qualitative and quantitative evaluations.

Attention MechanismDiffusion ModelsStyle Transfer

0 likes · 10 min read

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

DaTaobao Tech

Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AIAttention MechanismNLP

0 likes · 44 min read

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

Baobao Algorithm Notes

Nov 7, 2024 · Artificial Intelligence

Demystifying FlashAttention: A Minimalist Derivation of the Algorithm

This article presents a concise, step‑by‑step derivation of FlashAttention, explaining the prerequisite linear‑algebra concepts, the softmax simplifications, and the parallel computation workflow—including the LSE‑enhanced version—so readers can grasp the algorithm’s elegance without heavy mathematics.

Algorithm DerivationAttention MechanismFlashAttention

0 likes · 8 min read

Demystifying FlashAttention: A Minimalist Derivation of the Algorithm

Ops Development & AI Practice

Jun 22, 2024 · Artificial Intelligence

Why Transformers Revolutionized AI: From NLP to Vision and Speech

Transformers, introduced in 2017, have reshaped neural networks by leveraging attention mechanisms to outperform RNNs and CNNs across NLP, computer vision, and speech tasks, offering parallel processing, long‑range dependency capture, and versatile applications such as translation, text generation, image classification, and speech recognition.

Attention MechanismNLPTransformer

0 likes · 6 min read

Why Transformers Revolutionized AI: From NLP to Vision and Speech

Architect's Guide

May 13, 2024 · Artificial Intelligence

Understanding the Core Principles of Transformer Architecture

This article explains how Transformer models work by detailing the encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, and feed‑forward networks, and shows their applications in machine translation, recommendation systems, and large language models.

AIAttention MechanismTransformer

0 likes · 11 min read

Understanding the Core Principles of Transformer Architecture

ITPUB

Apr 20, 2024 · Artificial Intelligence

Unveiling GPT-4’s Magic: How Large Language Models Learn, Reason, and Translate – A Kid‑Friendly Story

This article uses a playful dialogue to demystify how large language models like GPT‑4 work, covering data collection, vectorization, the transformer’s attention mechanism, position encoding, training stages, multilingual translation, reasoning puzzles, and alignment, all illustrated through the tale of a curious learner named Wuming.

Attention MechanismTransformerartificial-intelligence

0 likes · 50 min read

Unveiling GPT-4’s Magic: How Large Language Models Learn, Reason, and Translate – A Kid‑Friendly Story

Top Architect

Apr 18, 2024 · Artificial Intelligence

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

This article provides a comprehensive overview of Transformer models, covering their attention-based architecture, encoder-decoder structure, training procedures including teacher forcing, inference workflow, advantages over RNNs, and various applications in natural language processing such as translation, summarization, and classification.

Attention MechanismModel TrainingNLP

0 likes · 11 min read

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

NewBeeNLP

Apr 16, 2024 · Artificial Intelligence

Demystifying the Transformer: Step‑by‑Step PaddlePaddle Implementation

This article provides a comprehensive, code‑rich walkthrough of the Transformer architecture using PaddlePaddle, covering the encoder and decoder components, residual connections, layer normalization, feed‑forward networks, scaled dot‑product and multi‑head attention, and shows how to assemble the full model with training and inference functions.

Attention MechanismEncoderPaddlePaddle

0 likes · 17 min read

Demystifying the Transformer: Step‑by‑Step PaddlePaddle Implementation

Architect

Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

Attention MechanismModel TrainingNLP

0 likes · 14 min read

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

JD Tech

Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

Attention MechanismChatGPTcompression

0 likes · 27 min read

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

JD Cloud Developers

Oct 10, 2023 · Artificial Intelligence

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

This article examines whether ChatGPT and other large language models exhibit true Theory of Mind, detailing the role of attention mechanisms, neural network architecture, emergent abilities, the Chinese‑room argument, and how compression of massive textual data underlies their apparent intelligence.

Attention MechanismTheory of Mindcompression

0 likes · 30 min read

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

JD Retail Technology

Oct 9, 2023 · Artificial Intelligence

Does ChatGPT Possess Theory of Mind? An Exploration of Attention Mechanisms, Emergence, and Compression in Large Language Models

Recent research suggests GPT‑3 exhibits Theory of Mind abilities, prompting a deep dive into attention mechanisms, neural network fundamentals, emergent capabilities, and the role of compression in large language models, while examining philosophical thought experiments like the Chinese Room to question true machine intelligence.

Attention MechanismChatGPTTheory of Mind

0 likes · 26 min read

Does ChatGPT Possess Theory of Mind? An Exploration of Attention Mechanisms, Emergence, and Compression in Large Language Models

Kuaishou Tech

Aug 8, 2023 · Artificial Intelligence

TWIN: Two-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction

This paper presents TWIN, a two-stage interest network that aligns the similarity metrics of coarse‑grained and fine‑grained modules to improve lifelong user behavior modeling for CTR prediction in large‑scale online recommendation systems.

Attention MechanismCTR PredictionKuaishou

0 likes · 10 min read

TWIN: Two-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction

Sohu Tech Products

Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

Attention MechanismBERTEncoder-Decoder

0 likes · 13 min read

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

Architects' Tech Alliance

May 15, 2023 · Artificial Intelligence

How Transformer Powers ChatGPT: A Deep Dive into Attention and Architecture

This article provides a comprehensive analysis of the Transformer model behind ChatGPT, covering its origin, core mechanisms such as embedding, positional encoding, self‑attention, multi‑head attention, a step‑by‑step translation example, and the broader implications for AI research and industry.

AI ArchitectureAttention MechanismChatGPT

0 likes · 19 min read

How Transformer Powers ChatGPT: A Deep Dive into Attention and Architecture

DataFunSummit

Feb 19, 2023 · Artificial Intelligence

Understanding In-Context Learning in Large Language Models: Experiments, Analysis, and Theoretical Insights

This article explains the concept of in‑context learning in large language models, presents experimental evaluations such as copy‑output, date‑formatting, and label‑remapping tasks, and discusses a recent theoretical analysis that links attention layers to implicit gradient‑based fine‑tuning, highlighting why model scale and data volume matter.

Attention MechanismGPT-3In-Context Learning

0 likes · 15 min read

Understanding In-Context Learning in Large Language Models: Experiments, Analysis, and Theoretical Insights

21CTO

Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer: How Attention Powers ChatGPT and Modern AI

This article breaks down the Transformer architecture behind ChatGPT, explaining its attention mechanism, embedding, positional encoding, and multi‑head self‑attention, while highlighting the model's impact on AI research, data requirements, and future innovations.

Attention MechanismChatGPTTransformer

0 likes · 18 min read

Understanding the Transformer: How Attention Powers ChatGPT and Modern AI

IT Architects Alliance

Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”

This article provides a comprehensive, plain‑language walkthrough of the 2017 “Attention Is All You Need” paper, explaining the Transformer’s architecture, core mechanisms such as embedding, positional encoding and self‑attention, and discussing its broader impact on AI research and applications.

AIAttention MechanismTransformer

0 likes · 17 min read

Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”

HomeTech

Nov 16, 2022 · Artificial Intelligence

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.

AIAttention Mechanismactor-critic

0 likes · 22 min read

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

DataFunSummit

Nov 14, 2022 · Artificial Intelligence

Machine Learning Methods for Solving Combinatorial Optimization Problems

This article reviews recent advances in applying machine learning—especially attention mechanisms, graph neural networks, and reinforcement learning—to combinatorial optimization, outlines fundamental problem definitions, classic algorithms, modern ML‑based approaches, experimental results, and future research directions.

Attention Mechanismalgorithmscombinatorial optimization

0 likes · 18 min read

Machine Learning Methods for Solving Combinatorial Optimization Problems

DataFunTalk

Oct 24, 2022 · Artificial Intelligence

Efficient Target Attention (ETA) for Long-Term User Behavior Modeling in Click‑Through Rate Prediction

Efficient Target Attention (ETA) introduces a low‑cost hash‑based attention operator that enables end‑to‑end modeling of ultra‑long user behavior sequences for CTR prediction, achieving significant online CTR, GMV, and QPS improvements in Alibaba’s Taobao feed recommendation system.

Attention MechanismCTR PredictionHashing

0 likes · 20 min read

Efficient Target Attention (ETA) for Long-Term User Behavior Modeling in Click‑Through Rate Prediction

DataFunSummit

Feb 26, 2022 · Artificial Intelligence

Graph-Based Sparse Behavior Recall Models for Content Recommendation

This article presents a comprehensive study of graph‑based recall techniques for content recommendation, detailing how knowledge‑graph‑augmented user‑behavior graphs and novel attention‑driven models such as GADM, SGGA, and SGGGA improve performance for users with sparse interaction histories.

Attention MechanismGraph Neural NetworksKnowledge Graph

0 likes · 11 min read

Graph-Based Sparse Behavior Recall Models for Content Recommendation

DataFunTalk

Jan 17, 2022 · Artificial Intelligence

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

This talk introduces the motivation, design, theoretical analysis, and extensive experimental results of Tencent Angel Graph's Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS), which address GNN scalability and flexibility by using node‑wise adaptive propagation, attention‑based feature fusion, and a lightweight training pipeline.

Attention MechanismGAMLPGraph Neural Networks

0 likes · 18 min read

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

Code DAO

Dec 22, 2021 · Artificial Intelligence

How Context R-CNN Leverages Temporal Context to Detect Occluded Objects

The article reviews the Context R-CNN paper, which introduces short‑term and long‑term memory banks and an attention mechanism to incorporate temporal context from multiple frames captured by a fixed camera, enabling robust detection of partially occluded, low‑light, distant, or background‑cluttered objects, and shows quantitative gains over standard Faster R‑CNN.

Attention MechanismContext R-CNNFaster R-CNN

0 likes · 6 min read

How Context R-CNN Leverages Temporal Context to Detect Occluded Objects

DataFunTalk

May 22, 2021 · Artificial Intelligence

Baidu's Video Foundation Technology Architecture and Key AI Techniques

This article presents an overview of Baidu's video foundation technology architecture, covering the video R&D platform, core AI techniques for video understanding, editing, surveillance, and general vision, and detailing innovations such as Attention‑Cluster networks, cross‑modality attention with graph convolution, GANs, super‑resolution, and adaptive encoding.

Adaptive EncodingAttention MechanismGaN

0 likes · 14 min read

Baidu's Video Foundation Technology Architecture and Key AI Techniques

New Oriental Technology

Jan 25, 2021 · Artificial Intelligence

Transformer Model: Attention Mechanism in Machine Translation

The 2017 Transformer model introduced by Vaswani et al. revolutionized machine translation by relying solely on attention mechanisms, outperforming traditional RNN and CNN approaches through parallel processing and improved contextual understanding.

AIAttention MechanismNLP

0 likes · 4 min read

Transformer Model: Attention Mechanism in Machine Translation

58 Tech

Nov 11, 2020 · Artificial Intelligence

Deep Learning for Click‑Through Rate Prediction in 58.com Home‑Page Recommendation

This article details how 58.com leverages deep learning models such as DNN, Wide&Deep, DeepFM, DIN and DIEN, combined with extensive user‑behavior feature engineering, offline vectorization, and online TensorFlow‑Serving pipelines to improve home‑page recommendation click‑through rates and overall platform efficiency.

A/B testingAttention MechanismCTR Prediction

0 likes · 25 min read

Deep Learning for Click‑Through Rate Prediction in 58.com Home‑Page Recommendation

DataFunTalk

Oct 17, 2020 · Artificial Intelligence

DyHAN: Dynamic Heterogeneous Graph Embedding with Hierarchical Attention

This article introduces DyHAN, a dynamic heterogeneous graph embedding method that employs hierarchical attention across node, edge, and temporal dimensions to capture evolving user-item interactions, demonstrates superior performance over static and existing dynamic baselines, and reports significant online improvements in Alibaba’s recommendation system.

AlibabaAttention Mechanismdynamic graphs

0 likes · 9 min read

DyHAN: Dynamic Heterogeneous Graph Embedding with Hierarchical Attention

Meituan Technology Team

Oct 15, 2020 · Artificial Intelligence

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

The paper introduces the Answer‑Driven Visual State Estimator (ADVSE), which uses answer‑driven focusing attention and conditional visual information fusion to dynamically incorporate answers into visual dialogue, overcoming static encoding limitations and achieving state‑of‑the‑art performance on the GuessWhat?! question‑generation and guessing tasks.

Attention MechanismMultimodal AIState Estimation

0 likes · 10 min read

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

Kuaishou Large Model

Oct 15, 2020 · Artificial Intelligence

How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR

This article reviews Kuashou Y‑Tech’s ECCV‑2020 paper on monocular depth estimation, detailing its novel GCB‑SAB network, new HC‑Depth dataset, specialized loss functions and edge‑aware training, and demonstrates superior performance on NYUv2, TUM and real‑world mobile AR applications.

Attention MechanismMobile ARcomputer vision

0 likes · 14 min read

How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR

JD Retail Technology

Oct 10, 2020 · Artificial Intelligence

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

This article introduces a Kalman Filtering Attention (KFAtt) framework that enhances click‑through‑rate (CTR) prediction by modeling user behavior with a Kalman‑filter‑based attention mechanism and a frequency‑capped variant, addressing new‑interest coverage and frequency bias in e‑commerce scenarios.

Attention MechanismCTR PredictionKalman filter

0 likes · 11 min read

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

Tencent Cloud Developer

Sep 23, 2020 · Artificial Intelligence

NLP Model Interpretability: White-box and Black-box Methods and Business Applications

The article reviews NLP interpretability techniques, contrasting white‑box approaches that probe model internals such as neuron analysis, diagnostic classifiers, and attention with black‑box strategies like rationales, adversarial testing, and local surrogates, and argues that black‑box methods are generally more practical for business deployment despite offering shallower insights.

Attention MechanismBERTLIME

0 likes · 12 min read

NLP Model Interpretability: White-box and Black-box Methods and Business Applications

DataFunTalk

Sep 18, 2020 · Artificial Intelligence

MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

This article reviews the MiNet model, which leverages cross‑domain information by modeling long‑term, source‑domain short‑term, and target‑domain short‑term user interests with hierarchical attention and an auxiliary task to improve CTR prediction and alleviate cold‑start issues.

Attention MechanismCTR PredictionMiNet

0 likes · 12 min read

MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

DataFunTalk

Aug 29, 2020 · Artificial Intelligence

User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

This article presents Alibaba's comprehensive approach to user modeling for search CTR/CVR ranking, detailing the abstraction of user information, multi‑scale behavior processing, enhanced transformer‑based model structures, client‑side click and exposure modeling, and experimental results showing significant AUC improvements.

AlibabaAttention MechanismCTR Prediction

0 likes · 18 min read

User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

Alibaba Cloud Developer

Dec 26, 2019 · Artificial Intelligence

How Decomposed Linguistic Representations Overcome Language Priors in VQA

This article reviews a AAAI 2020 paper that introduces a language‑attention based Visual Question Answering model which decomposes questions into type, object, and concept expressions to mitigate language bias, explains its modular architecture, and demonstrates superior performance on VQA‑CP v2 through extensive experiments and ablations.

Attention MechanismMultimodal LearningVQA-CP

0 likes · 14 min read

How Decomposed Linguistic Representations Overcome Language Priors in VQA

Alibaba Cloud Developer

Dec 3, 2019 · Artificial Intelligence

How Alibaba Detects ‘Disgusting’ Images on Taobao with AI

This article describes Alibaba's AI system for automatically filtering nauseating product images on Taobao, covering challenges such as cold‑start, class imbalance, and diverse visual features, and detailing solutions like semi‑supervised learning, active learning, OHEM‑cascade, attention mechanisms, and the resulting business impact.

Active LearningAttention MechanismSemi-supervised Learning

0 likes · 15 min read

How Alibaba Detects ‘Disgusting’ Images on Taobao with AI

Suning Technology

Jul 24, 2019 · Artificial Intelligence

Multi‑Scale Body‑Part Masks Revolutionize Person Re‑Identification at CVPR 2019

At CVPR 2019 in Long Beach, Suning’s AI team presented a breakthrough paper on multi‑scale body‑part mask guided attention for person re‑identification, detailing the conference’s selectivity, the challenges of re‑identification, and how their deep‑learning approach achieves state‑of‑the‑art performance.

Attention MechanismCVPR 2019deep learning

0 likes · 5 min read

Multi‑Scale Body‑Part Masks Revolutionize Person Re‑Identification at CVPR 2019

Ctrip Technology

May 21, 2019 · Artificial Intelligence

A Brief Overview of Machine Translation: History, Neural Models, and Practical Insights

This article surveys the evolution of machine translation from early rule‑based systems to modern neural architectures, explains how translation engines are trained, highlights recent advances such as attention and Transformers, and shares practical experience and current challenges in the field.

Attention MechanismMachine TranslationTransformer

0 likes · 11 min read

A Brief Overview of Machine Translation: History, Neural Models, and Practical Insights

DataFunTalk

May 20, 2019 · Artificial Intelligence

Evolution of Alibaba's Advertising CTR Prediction Models: From Linear Methods to Deep Interest Evolution Networks

The article reviews the characteristics of e‑commerce personalized prediction, outlines Alibaba's model iteration from large‑scale linear regression to deep learning architectures such as DIN, CrossMedia, and Deep Interest Evolution, and discusses future directions like disentangled representation and white‑box modeling.

Attention MechanismCTR PredictionModel Evolution

0 likes · 11 min read

Evolution of Alibaba's Advertising CTR Prediction Models: From Linear Methods to Deep Interest Evolution Networks

Sohu Tech Products

Mar 13, 2019 · Artificial Intelligence

Attentive Group Recommendation (AGR): An Attention‑Based Deep Learning Model for Group Recommendation

This paper proposes AGR, the first group recommendation model that incorporates an attention mechanism to dynamically learn each member’s influence weight within a group, enabling flexible modeling of group decision processes and achieving superior performance over existing memory‑based, model‑based, and probabilistic baselines across four real‑world datasets.

Attention MechanismBPRcollaborative filtering

0 likes · 26 min read

Attentive Group Recommendation (AGR): An Attention‑Based Deep Learning Model for Group Recommendation

Tencent Cloud Developer

Sep 30, 2018 · Artificial Intelligence

Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress

The article surveys Tencent’s recent advances in smart‑speaker voice interaction, detailing a full technology chain—from front‑end capture, wake‑up and enhancement, through speaker verification and short‑speech voiceprint, to TDNN/LSTM speech recognition, target speaker extraction, and end‑to‑end attention modeling for robust, personalized performance.

Attention MechanismTTSmicrophone array

0 likes · 18 min read

Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress

iQIYI Technical Product Team

Sep 14, 2018 · Artificial Intelligence

AI RAP: End-to-End Speech Synthesis for Rap Generation Using Location‑Sensitive Attention and Inference Mask

AI RAP is an end‑to‑end AI service that lets users generate personalized rap with a single click by combining location‑sensitive attention and an inference mask to achieve perfect alignment, beat‑synchronous timing, multi‑character voice timbres, sub‑second synthesis, and a scalable architecture supporting millions of daily users.

AIAttention MechanismAudio Processing

0 likes · 5 min read

AI RAP: End-to-End Speech Synthesis for Rap Generation Using Location‑Sensitive Attention and Inference Mask

Alibaba Cloud Developer

Aug 17, 2018 · Artificial Intelligence

Can Multi‑Task Learning Shorten E‑Commerce Titles Without Losing Sales?

This paper proposes a multi‑task learning approach that compresses overly long e‑commerce product titles into concise short titles using a Pointer Network, while simultaneously generating user search queries with an attention‑based encoder‑decoder, achieving higher readability, informativeness, and conversion rates than traditional methods.

Attention MechanismMulti-Task LearningSequence-to-Sequence

0 likes · 11 min read

Can Multi‑Task Learning Shorten E‑Commerce Titles Without Losing Sales?

Alibaba Cloud Developer

Aug 16, 2018 · Artificial Intelligence

How Syntax‑Sensitive Entity Representations Boost Neural Relation Extraction

This paper introduces a syntax‑aware entity representation using Tree‑GRU and attention mechanisms, demonstrating that enriching entity semantics with dependency tree information significantly improves neural relation extraction performance on the NYT dataset compared to existing distant supervision models.

Attention MechanismTree-GRUentity representation

0 likes · 7 min read

How Syntax‑Sensitive Entity Representations Boost Neural Relation Extraction

DataFunTalk

Aug 12, 2018 · Artificial Intelligence

Interpretability of Deep Learning and Low‑Frequency Event Learning in Financial Applications

The article reviews the limitations of mainstream deep‑learning models in finance, proposes hybrid tree‑based and Wide&Deep architectures combined with attention, sensitivity and variance analysis to improve interpretability and low‑frequency event detection, and validates the approach with a large‑scale insurance recommendation case study.

Attention MechanismWide&Deepfinance

0 likes · 17 min read

Interpretability of Deep Learning and Low‑Frequency Event Learning in Financial Applications

Meitu Technology

Jul 24, 2018 · Artificial Intelligence

Interaction-aware Spatio-Temporal Pyramid Attention Networks for Action Classification

Researchers introduce an Interaction‑aware Spatio‑Temporal Pyramid Attention network that embeds a PCA‑guided loss to capture complementary multi‑scale features, enabling end‑to‑end video action classification with state‑of‑the‑art accuracy on UCF101, HMDB51, Charades and internal datasets.

Attention MechanismCNNaction classification

0 likes · 7 min read

Interaction-aware Spatio-Temporal Pyramid Attention Networks for Action Classification

Alibaba Cloud Developer

Jul 11, 2018 · Artificial Intelligence

Can Global Ranking Boost E‑Commerce GMV? A New AI Approach

Traditional e‑commerce ranking ignores interactions among displayed items, but this study introduces a novel global ranking method that models mutual influences, optimizes expected GMV using extended global features and RNN‑based sequence generation, achieving a 5% GMV lift in large‑scale A/B tests.

Attention MechanismGMVRNN

0 likes · 12 min read

Can Global Ranking Boost E‑Commerce GMV? A New AI Approach

Alibaba Cloud Developer

Mar 15, 2018 · Artificial Intelligence

How Deep Learning Transforms Knowledge Graph Relation Extraction

This article reviews the evolution from rule‑based DeepDive methods to deep‑learning approaches such as PCNNs and attention‑enhanced models for relation extraction, presents experimental results on the NYT dataset, discusses practical challenges in large‑scale deployment, and outlines future research directions.

Attention MechanismKnowledge GraphPCNN

0 likes · 14 min read

How Deep Learning Transforms Knowledge Graph Relation Extraction

Hulu Beijing

Dec 20, 2017 · Artificial Intelligence

How Attention Mechanisms Transform Seq2Seq Models for Better Translation

This article explains why attention mechanisms were introduced into Seq2Seq models, how they address the limitations of fixed‑length encoding, the role of bidirectional RNNs, and showcases their impact on machine translation and image captioning with illustrative diagrams.

Attention MechanismMachine TranslationRNN

0 likes · 10 min read

How Attention Mechanisms Transform Seq2Seq Models for Better Translation

Qunar Tech Salon

Nov 29, 2015 · Artificial Intelligence

From Symbolic Semantics to Vector Representations: Deep Learning for Natural Language Understanding

The article reviews symbolic knowledge bases such as WordNet, ConceptNet and FrameNet, explains how deep learning replaces them with vector‑based semantic representations, and discusses encoder‑decoder RNNs, attention mechanisms, and future directions for truly understanding language through experiential learning.

Attention MechanismRNNdeep learning

0 likes · 12 min read

From Symbolic Semantics to Vector Representations: Deep Learning for Natural Language Understanding