Tagged articles
69 articles
Page 1 of 1
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 30, 2026 · Artificial Intelligence

How DeepSeek V4’s CSA + HCA Break the Million‑Token Barrier

Traditional full‑attention cannot handle million‑token contexts due to exponential compute and memory growth, but DeepSeek V4’s Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) compress, sparsely index, and precisely compute tokens, cutting KV cache to 10% and FLOPs to 27% while enabling a 1‑M token window on a single GPU.

Attention MechanismCSAHCA
0 likes · 12 min read
How DeepSeek V4’s CSA + HCA Break the Million‑Token Barrier
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Apr 14, 2026 · Artificial Intelligence

How Self‑Supervised HINTS Extracts Human Insights from Time Series to Boost Forecast Accuracy

The paper introduces HINTS, a two‑stage self‑supervised framework that leverages Friedkin‑Johnsen opinion dynamics to mine latent human‑driven factors from time‑series residuals, integrates them via attention into state‑of‑the‑art predictors, and demonstrates consistent accuracy gains and interpretability across nine benchmark and real‑world datasets.

Attention MechanismFriedkin-Johnsen modelbenchmark evaluation
0 likes · 17 min read
How Self‑Supervised HINTS Extracts Human Insights from Time Series to Boost Forecast Accuracy
Machine Heart
Machine Heart
Apr 4, 2026 · Artificial Intelligence

Does Scale Stealthily Hijack Attention? PMDformer’s Simple Subtraction Fix for Long-Term Forecasting

The paper identifies scale differences between patches as a hidden source of attention distortion in long‑term time‑series forecasting, introduces PMDformer with Patch Mean Decoupling, Neighbor Variable Attention, and Trend Recovery Attention, and demonstrates state‑of‑the‑art accuracy and efficiency across eight benchmark datasets.

Attention MechanismICLR2026Long-term Time Series Forecasting
0 likes · 8 min read
Does Scale Stealthily Hijack Attention? PMDformer’s Simple Subtraction Fix for Long-Term Forecasting
Data Party THU
Data Party THU
Apr 3, 2026 · Artificial Intelligence

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

The article reviews the Kimi team's Attention Residuals approach, which substitutes traditional ResNet additive shortcuts with learned attention‑based weighting, explains the theoretical motivation linking depth to time, details full‑attention and block‑wise implementations, presents experimental results showing up to 1.25× compute efficiency and improved performance on reasoning and knowledge tasks.

Attention MechanismDeep LearningResidual Networks
0 likes · 11 min read
Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Mar 28, 2026 · Artificial Intelligence

What Large‑Model Training Actually Optimizes: Parameters, Attention, and Knowledge Explained

This article breaks down the core of large‑model training by showing that training optimizes neural‑network parameters, that attention is a mechanism realized by those parameters, and that knowledge is encoded implicitly within the weight matrices, providing a clear hierarchy for interview or presentation use.

AI InterviewAttention MechanismDeep Learning
0 likes · 6 min read
What Large‑Model Training Actually Optimizes: Parameters, Attention, and Knowledge Explained
Data Party THU
Data Party THU
Mar 26, 2026 · Artificial Intelligence

How Mixture-of-Depths Attention Boosts Large Language Model Efficiency

This article examines the Mixture‑of‑Depths Attention (MoDA) mechanism, detailing its novel flash‑compatible KV layout, combined sequence‑depth attention, theoretical analysis, and extensive experiments that show significant reductions in validation loss and accuracy gains on downstream tasks compared to the OLMo2 baseline.

Attention MechanismDeep KVFlashAttention
0 likes · 9 min read
How Mixture-of-Depths Attention Boosts Large Language Model Efficiency
Qborfy AI
Qborfy AI
Feb 21, 2026 · Artificial Intelligence

How Self-Attention Powers Modern AI: From Theory to Real-World Impact

This article explains the self‑attention mechanism behind transformers, detailing its core components, mathematical formulation, step‑by‑step example, multi‑head extension, industry use cases, and a thorough comparison with RNN and CNN approaches, all supported by concrete numbers and citations.

Attention MechanismDeep LearningSelf-Attention
0 likes · 8 min read
How Self-Attention Powers Modern AI: From Theory to Real-World Impact
AI Cyberspace
AI Cyberspace
Feb 13, 2026 · Artificial Intelligence

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

This article traces the evolution of attention mechanisms from their inaugural application in computer vision and machine translation to their central role in modern Transformer models, detailing the underlying RNN‑Attention designs, the breakthrough in sequence alignment, and the innovations that enabled high‑performance, parallelizable deep learning architectures.

Attention MechanismComputer VisionDeep Learning
0 likes · 14 min read
How Attention Mechanisms Revolutionized Computer Vision and Machine Translation
AI Architecture Hub
AI Architecture Hub
Jan 7, 2026 · Artificial Intelligence

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

This article provides a comprehensive, beginner‑friendly walkthrough of the landmark 2017 paper “Attention Is All You Need,” covering its authors, historical context, the shortcomings of RNNs and CNNs, the birth of self‑attention, the Transformer architecture, and its transformative impact on modern AI.

AI historyAttention MechanismDeep Learning
0 likes · 9 min read
Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 19, 2025 · Artificial Intelligence

The 9 Key Ideas Behind FlashAttention

FlashAttention accelerates transformer inference by combining nine techniques—including loss‑less attention, GPU memory‑pyramid optimization, SRAM‑reusing tiling, safe softmax scaling, online buffering, tile‑size constraints, parallel multiplication, reduced KV slicing, and integrated backward‑pass caching—to achieve efficient, high‑throughput computation on modern GPUs.

Attention MechanismFlashAttentionGPU Optimization
0 likes · 8 min read
The 9 Key Ideas Behind FlashAttention
HyperAI Super Neural
HyperAI Super Neural
Dec 12, 2025 · Artificial Intelligence

Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks

This roundup presents five recent AI papers covering hierarchical sparse attention for ultra‑long context, Nvidia's Alpamayo‑R1 VLA model for autonomous driving, the non‑autoregressive F5‑TTS system, LatentMAS for latent‑space multi‑agent collaboration, and Deeper‑GXX that deepens arbitrary graph neural networks, highlighting each method's key innovations and reported performance gains.

Attention Mechanismautonomous drivinggraph neural networks
0 likes · 6 min read
Weekly AI Paper Digest: Attention, Nvidia VLA, TTS, and Graph Neural Networks
Tencent Cloud Developer
Tencent Cloud Developer
Nov 4, 2025 · Artificial Intelligence

From Functions to Transformers: Mastering Neural Networks Step by Step

This article walks you through the evolution from basic mathematical functions to modern large‑scale models, explaining activation functions, forward and backward propagation, loss calculation, gradient descent, regularization, dropout, word embeddings, RNNs, and the core mechanics of the Transformer architecture.

Attention MechanismDeep LearningNeural Networks
0 likes · 15 min read
From Functions to Transformers: Mastering Neural Networks Step by Step
AIWalker
AIWalker
Sep 17, 2025 · Artificial Intelligence

Cutting-Edge Attention Mechanism Innovations for 2025: Modal Fusion and Domain Adaptation

This article surveys 183 recent attention‑mechanism papers, classifies them into four innovation categories, and highlights representative works such as MILA, ARFFT, CNN‑Transformer for speech emotion, and LSTM‑attention epidemic forecasting, providing concrete methods, code links, and performance insights.

2025Attention MechanismDeep Learning
0 likes · 7 min read
Cutting-Edge Attention Mechanism Innovations for 2025: Modal Fusion and Domain Adaptation
Data Party THU
Data Party THU
Sep 17, 2025 · Artificial Intelligence

How Matching Networks Tackle Imbalance with Cosine Similarity and Attention

This article provides a comprehensive technical review of Matching Networks, covering cosine similarity mathematics, its transformations, the bias introduced by imbalanced support sets, and a range of mitigation strategies such as adaptive weighting, global distance‑matrix normalization, prior‑based weighting, hierarchical multi‑scale matching, hybrid learning architectures, and attention‑driven dynamic sample selection.

Attention MechanismCosine SimilarityMatching Networks
0 likes · 10 min read
How Matching Networks Tackle Imbalance with Cosine Similarity and Attention
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 18, 2025 · Artificial Intelligence

30+ Expert Q&A on Large Language Model Architecture, Training, and Deployment

This article compiles more than thirty interview‑style questions and detailed answers covering large‑model fundamentals such as encoder‑decoder trade‑offs, self‑attention versus RNN, context length, tokenization, embedding strategies, FlashAttention, RoPE, prompt design, retrieval‑augmented generation, safety measures, fine‑tuning, and model distillation, providing a comprehensive technical reference for practitioners.

Attention Mechanismretrieval-augmented generation
0 likes · 53 min read
30+ Expert Q&A on Large Language Model Architecture, Training, and Deployment
AI Frontier Lectures
AI Frontier Lectures
Jun 10, 2025 · Artificial Intelligence

Can One Model Master All Remote Sensing Tasks? Introducing the TSSUN Framework

This paper presents the Temporal‑Spectral‑Spatial Unified Network (TSSUN), a flexible deep‑learning architecture that simultaneously handles semantic segmentation, semantic change detection, and binary change detection across heterogeneous remote‑sensing inputs, achieving state‑of‑the‑art performance without task‑specific retraining.

Attention MechanismDeep LearningTSSUN
0 likes · 15 min read
Can One Model Master All Remote Sensing Tasks? Introducing the TSSUN Framework
AIWalker
AIWalker
Feb 26, 2025 · Artificial Intelligence

Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap

The paper analytically identifies injectivity and local modeling as the two key factors causing the performance gap between linear and Softmax attention, proposes the InLine attention modifications to restore these properties, and demonstrates through extensive Vision Transformer experiments that the enhanced linear attention matches or surpasses Softmax while retaining linear computational cost.

Attention MechanismEfficient TransformersLinear Attention
0 likes · 24 min read
Why Linear Attention Lags Behind Softmax and How Two Simple Tweaks Close the Gap
Architecture Digest
Architecture Digest
Feb 24, 2025 · Artificial Intelligence

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

The article introduces MoBA, a Mixture‑of‑Block‑Attention mechanism that applies Mixture‑of‑Experts principles to transformer attention, enabling efficient long‑context processing for large language models while maintaining performance comparable to full attention through sparse, trainable block selection and seamless switching.

Attention MechanismLLMMixture of Experts
0 likes · 12 min read
MoBA: Mixture of Block Attention for Long‑Context Large Language Models
AIWalker
AIWalker
Feb 19, 2025 · Artificial Intelligence

YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection

YOLOv12 introduces an attention‑centric architecture, a lightweight regional attention module, and the R‑ELAN aggregation network, delivering consistent mAP gains and lower latency across N, S, M, L and X model scales while surpassing previous YOLO versions and other real‑time detectors.

Attention MechanismBenchmarkComputer Vision
0 likes · 8 min read
YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection
AIWalker
AIWalker
Jan 13, 2025 · Artificial Intelligence

ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua

ArtCrafter introduces a novel text‑image style transfer framework that leverages attention‑based style extraction, text‑image alignment enhancement, and explicit modulation to achieve controllable, diverse, and high‑fidelity visual results, outperforming existing methods in both qualitative and quantitative evaluations.

Attention MechanismStyle Transferdiffusion models
0 likes · 10 min read
ArtCrafter: A Controllable, Diverse Style Transfer Framework from Tsinghua
DaTaobao Tech
DaTaobao Tech
Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AIAttention MechanismDeep Learning
0 likes · 44 min read
Understanding Neural Networks and Transformers: Principles, Implementation, and Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 7, 2024 · Artificial Intelligence

Demystifying FlashAttention: A Minimalist Derivation of the Algorithm

This article presents a concise, step‑by‑step derivation of FlashAttention, explaining the prerequisite linear‑algebra concepts, the softmax simplifications, and the parallel computation workflow—including the LSE‑enhanced version—so readers can grasp the algorithm’s elegance without heavy mathematics.

Algorithm DerivationAttention MechanismFlashAttention
0 likes · 8 min read
Demystifying FlashAttention: A Minimalist Derivation of the Algorithm
Ops Development & AI Practice
Ops Development & AI Practice
Jun 22, 2024 · Artificial Intelligence

Why Transformers Revolutionized AI: From NLP to Vision and Speech

Transformers, introduced in 2017, have reshaped neural networks by leveraging attention mechanisms to outperform RNNs and CNNs across NLP, computer vision, and speech tasks, offering parallel processing, long‑range dependency capture, and versatile applications such as translation, text generation, image classification, and speech recognition.

Attention MechanismComputer VisionDeep Learning
0 likes · 6 min read
Why Transformers Revolutionized AI: From NLP to Vision and Speech
Architect's Guide
Architect's Guide
May 13, 2024 · Artificial Intelligence

Understanding the Core Principles of Transformer Architecture

This article explains how Transformer models work by detailing the encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, and feed‑forward networks, and shows their applications in machine translation, recommendation systems, and large language models.

AIAttention MechanismDeep Learning
0 likes · 11 min read
Understanding the Core Principles of Transformer Architecture
ITPUB
ITPUB
Apr 20, 2024 · Artificial Intelligence

Unveiling GPT-4’s Magic: How Large Language Models Learn, Reason, and Translate – A Kid‑Friendly Story

This article uses a playful dialogue to demystify how large language models like GPT‑4 work, covering data collection, vectorization, the transformer’s attention mechanism, position encoding, training stages, multilingual translation, reasoning puzzles, and alignment, all illustrated through the tale of a curious learner named Wuming.

Attention MechanismTransformerartificial intelligence
0 likes · 50 min read
Unveiling GPT-4’s Magic: How Large Language Models Learn, Reason, and Translate – A Kid‑Friendly Story
Top Architect
Top Architect
Apr 18, 2024 · Artificial Intelligence

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

This article provides a comprehensive overview of Transformer models, covering their attention-based architecture, encoder-decoder structure, training procedures including teacher forcing, inference workflow, advantages over RNNs, and various applications in natural language processing such as translation, summarization, and classification.

Attention MechanismDeep LearningInference
0 likes · 11 min read
Understanding Transformers: Architecture, Attention Mechanism, Training and Inference
NewBeeNLP
NewBeeNLP
Apr 16, 2024 · Artificial Intelligence

Demystifying the Transformer: Step‑by‑Step PaddlePaddle Implementation

This article provides a comprehensive, code‑rich walkthrough of the Transformer architecture using PaddlePaddle, covering the encoder and decoder components, residual connections, layer normalization, feed‑forward networks, scaled dot‑product and multi‑head attention, and shows how to assemble the full model with training and inference functions.

Attention MechanismDecoderDeep Learning
0 likes · 17 min read
Demystifying the Transformer: Step‑by‑Step PaddlePaddle Implementation
Architect
Architect
Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

Attention MechanismInferenceModel Training
0 likes · 14 min read
Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training
JD Tech
JD Tech
Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

Attention MechanismChatGPTEmergence
0 likes · 27 min read
Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room
JD Cloud Developers
JD Cloud Developers
Oct 10, 2023 · Artificial Intelligence

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

This article examines whether ChatGPT and other large language models exhibit true Theory of Mind, detailing the role of attention mechanisms, neural network architecture, emergent abilities, the Chinese‑room argument, and how compression of massive textual data underlies their apparent intelligence.

Attention MechanismEmergenceNeural Networks
0 likes · 30 min read
Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained
JD Retail Technology
JD Retail Technology
Oct 9, 2023 · Artificial Intelligence

Does ChatGPT Possess Theory of Mind? An Exploration of Attention Mechanisms, Emergence, and Compression in Large Language Models

Recent research suggests GPT‑3 exhibits Theory of Mind abilities, prompting a deep dive into attention mechanisms, neural network fundamentals, emergent capabilities, and the role of compression in large language models, while examining philosophical thought experiments like the Chinese Room to question true machine intelligence.

Attention MechanismChatGPTEmergence
0 likes · 26 min read
Does ChatGPT Possess Theory of Mind? An Exploration of Attention Mechanisms, Emergence, and Compression in Large Language Models
Sohu Tech Products
Sohu Tech Products
Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

Attention MechanismBERTEncoder-Decoder
0 likes · 13 min read
Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2023 · Artificial Intelligence

How Transformer Powers ChatGPT: A Deep Dive into Attention and Architecture

This article provides a comprehensive analysis of the Transformer model behind ChatGPT, covering its origin, core mechanisms such as embedding, positional encoding, self‑attention, multi‑head attention, a step‑by‑step translation example, and the broader implications for AI research and industry.

AI ArchitectureAttention MechanismChatGPT
0 likes · 19 min read
How Transformer Powers ChatGPT: A Deep Dive into Attention and Architecture
DataFunSummit
DataFunSummit
Feb 19, 2023 · Artificial Intelligence

Understanding In-Context Learning in Large Language Models: Experiments, Analysis, and Theoretical Insights

This article explains the concept of in‑context learning in large language models, presents experimental evaluations such as copy‑output, date‑formatting, and label‑remapping tasks, and discusses a recent theoretical analysis that links attention layers to implicit gradient‑based fine‑tuning, highlighting why model scale and data volume matter.

Attention MechanismFew‑Shot LearningGPT-3
0 likes · 15 min read
Understanding In-Context Learning in Large Language Models: Experiments, Analysis, and Theoretical Insights
21CTO
21CTO
Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer: How Attention Powers ChatGPT and Modern AI

This article breaks down the Transformer architecture behind ChatGPT, explaining its attention mechanism, embedding, positional encoding, and multi‑head self‑attention, while highlighting the model's impact on AI research, data requirements, and future innovations.

Attention MechanismChatGPTTransformer
0 likes · 18 min read
Understanding the Transformer: How Attention Powers ChatGPT and Modern AI
IT Architects Alliance
IT Architects Alliance
Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”

This article provides a comprehensive, plain‑language walkthrough of the 2017 “Attention Is All You Need” paper, explaining the Transformer’s architecture, core mechanisms such as embedding, positional encoding and self‑attention, and discussing its broader impact on AI research and applications.

AIAttention MechanismTransformer
0 likes · 17 min read
Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”
HomeTech
HomeTech
Nov 16, 2022 · Artificial Intelligence

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.

AIAttention Mechanismactor-critic
0 likes · 22 min read
Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition
DataFunSummit
DataFunSummit
Nov 14, 2022 · Artificial Intelligence

Machine Learning Methods for Solving Combinatorial Optimization Problems

This article reviews recent advances in applying machine learning—especially attention mechanisms, graph neural networks, and reinforcement learning—to combinatorial optimization, outlines fundamental problem definitions, classic algorithms, modern ML‑based approaches, experimental results, and future research directions.

AlgorithmsAttention Mechanismcombinatorial optimization
0 likes · 18 min read
Machine Learning Methods for Solving Combinatorial Optimization Problems
DataFunTalk
DataFunTalk
Oct 24, 2022 · Artificial Intelligence

Efficient Target Attention (ETA) for Long-Term User Behavior Modeling in Click‑Through Rate Prediction

Efficient Target Attention (ETA) introduces a low‑cost hash‑based attention operator that enables end‑to‑end modeling of ultra‑long user behavior sequences for CTR prediction, achieving significant online CTR, GMV, and QPS improvements in Alibaba’s Taobao feed recommendation system.

Attention MechanismCTR predictionHashing
0 likes · 20 min read
Efficient Target Attention (ETA) for Long-Term User Behavior Modeling in Click‑Through Rate Prediction
DataFunSummit
DataFunSummit
Feb 26, 2022 · Artificial Intelligence

Graph-Based Sparse Behavior Recall Models for Content Recommendation

This article presents a comprehensive study of graph‑based recall techniques for content recommendation, detailing how knowledge‑graph‑augmented user‑behavior graphs and novel attention‑driven models such as GADM, SGGA, and SGGGA improve performance for users with sparse interaction histories.

Attention MechanismDeep LearningKnowledge Graph
0 likes · 11 min read
Graph-Based Sparse Behavior Recall Models for Content Recommendation
DataFunTalk
DataFunTalk
Jan 17, 2022 · Artificial Intelligence

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

This talk introduces the motivation, design, theoretical analysis, and extensive experimental results of Tencent Angel Graph's Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS), which address GNN scalability and flexibility by using node‑wise adaptive propagation, attention‑based feature fusion, and a lightweight training pipeline.

Attention MechanismGAMLPNDLS
0 likes · 18 min read
Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks
Code DAO
Code DAO
Dec 22, 2021 · Artificial Intelligence

How Context R-CNN Leverages Temporal Context to Detect Occluded Objects

The article reviews the Context R-CNN paper, which introduces short‑term and long‑term memory banks and an attention mechanism to incorporate temporal context from multiple frames captured by a fixed camera, enabling robust detection of partially occluded, low‑light, distant, or background‑cluttered objects, and shows quantitative gains over standard Faster R‑CNN.

Attention MechanismContext R-CNNFaster R-CNN
0 likes · 6 min read
How Context R-CNN Leverages Temporal Context to Detect Occluded Objects
DataFunTalk
DataFunTalk
May 22, 2021 · Artificial Intelligence

Baidu's Video Foundation Technology Architecture and Key AI Techniques

This article presents an overview of Baidu's video foundation technology architecture, covering the video R&D platform, core AI techniques for video understanding, editing, surveillance, and general vision, and detailing innovations such as Attention‑Cluster networks, cross‑modality attention with graph convolution, GANs, super‑resolution, and adaptive encoding.

Adaptive EncodingAttention MechanismGAN
0 likes · 14 min read
Baidu's Video Foundation Technology Architecture and Key AI Techniques
58 Tech
58 Tech
Nov 11, 2020 · Artificial Intelligence

Deep Learning for Click‑Through Rate Prediction in 58.com Home‑Page Recommendation

This article details how 58.com leverages deep learning models such as DNN, Wide&Deep, DeepFM, DIN and DIEN, combined with extensive user‑behavior feature engineering, offline vectorization, and online TensorFlow‑Serving pipelines to improve home‑page recommendation click‑through rates and overall platform efficiency.

A/B testingAttention MechanismCTR prediction
0 likes · 25 min read
Deep Learning for Click‑Through Rate Prediction in 58.com Home‑Page Recommendation
DataFunTalk
DataFunTalk
Oct 17, 2020 · Artificial Intelligence

DyHAN: Dynamic Heterogeneous Graph Embedding with Hierarchical Attention

This article introduces DyHAN, a dynamic heterogeneous graph embedding method that employs hierarchical attention across node, edge, and temporal dimensions to capture evolving user-item interactions, demonstrates superior performance over static and existing dynamic baselines, and reports significant online improvements in Alibaba’s recommendation system.

AlibabaAttention Mechanismdynamic graphs
0 likes · 9 min read
DyHAN: Dynamic Heterogeneous Graph Embedding with Hierarchical Attention
Meituan Technology Team
Meituan Technology Team
Oct 15, 2020 · Artificial Intelligence

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

The paper introduces the Answer‑Driven Visual State Estimator (ADVSE), which uses answer‑driven focusing attention and conditional visual information fusion to dynamically incorporate answers into visual dialogue, overcoming static encoding limitations and achieving state‑of‑the‑art performance on the GuessWhat?! question‑generation and guessing tasks.

Attention MechanismState Estimationgoal-oriented
0 likes · 10 min read
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
Kuaishou Large Model
Kuaishou Large Model
Oct 15, 2020 · Artificial Intelligence

How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR

This article reviews Kuashou Y‑Tech’s ECCV‑2020 paper on monocular depth estimation, detailing its novel GCB‑SAB network, new HC‑Depth dataset, specialized loss functions and edge‑aware training, and demonstrates superior performance on NYUv2, TUM and real‑world mobile AR applications.

Attention MechanismComputer VisionDeep Learning
0 likes · 14 min read
How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR
JD Retail Technology
JD Retail Technology
Oct 10, 2020 · Artificial Intelligence

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

This article introduces a Kalman Filtering Attention (KFAtt) framework that enhances click‑through‑rate (CTR) prediction by modeling user behavior with a Kalman‑filter‑based attention mechanism and a frequency‑capped variant, addressing new‑interest coverage and frequency bias in e‑commerce scenarios.

Attention MechanismCTR predictionKalman Filter
0 likes · 11 min read
Kalman Filtering Attention for User Behavior Modeling in CTR Prediction
Tencent Cloud Developer
Tencent Cloud Developer
Sep 23, 2020 · Artificial Intelligence

NLP Model Interpretability: White-box and Black-box Methods and Business Applications

The article reviews NLP interpretability techniques, contrasting white‑box approaches that probe model internals such as neuron analysis, diagnostic classifiers, and attention with black‑box strategies like rationales, adversarial testing, and local surrogates, and argues that black‑box methods are generally more practical for business deployment despite offering shallower insights.

Attention MechanismBERTDeep Learning
0 likes · 12 min read
NLP Model Interpretability: White-box and Black-box Methods and Business Applications
DataFunTalk
DataFunTalk
Sep 18, 2020 · Artificial Intelligence

MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

This article reviews the MiNet model, which leverages cross‑domain information by modeling long‑term, source‑domain short‑term, and target‑domain short‑term user interests with hierarchical attention and an auxiliary task to improve CTR prediction and alleviate cold‑start issues.

Attention MechanismCTR predictionMiNet
0 likes · 12 min read
MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction
DataFunTalk
DataFunTalk
Aug 29, 2020 · Artificial Intelligence

User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba

This article presents Alibaba's comprehensive approach to user modeling for search CTR/CVR ranking, detailing the abstraction of user information, multi‑scale behavior processing, enhanced transformer‑based model structures, client‑side click and exposure modeling, and experimental results showing significant AUC improvements.

AlibabaAttention MechanismCTR prediction
0 likes · 18 min read
User Modeling for Search Ranking: Practices, Model Design, and Experimental Analysis at Alibaba
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 26, 2019 · Artificial Intelligence

How Decomposed Linguistic Representations Overcome Language Priors in VQA

This article reviews a AAAI 2020 paper that introduces a language‑attention based Visual Question Answering model which decomposes questions into type, object, and concept expressions to mitigate language bias, explains its modular architecture, and demonstrates superior performance on VQA‑CP v2 through extensive experiments and ablations.

Attention MechanismMultimodal LearningVQA-CP
0 likes · 14 min read
How Decomposed Linguistic Representations Overcome Language Priors in VQA
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 3, 2019 · Artificial Intelligence

How Alibaba Detects ‘Disgusting’ Images on Taobao with AI

This article describes Alibaba's AI system for automatically filtering nauseating product images on Taobao, covering challenges such as cold‑start, class imbalance, and diverse visual features, and detailing solutions like semi‑supervised learning, active learning, OHEM‑cascade, attention mechanisms, and the resulting business impact.

Attention MechanismE-commerce AIImage Classification
0 likes · 15 min read
How Alibaba Detects ‘Disgusting’ Images on Taobao with AI
Suning Technology
Suning Technology
Jul 24, 2019 · Artificial Intelligence

Multi‑Scale Body‑Part Masks Revolutionize Person Re‑Identification at CVPR 2019

At CVPR 2019 in Long Beach, Suning’s AI team presented a breakthrough paper on multi‑scale body‑part mask guided attention for person re‑identification, detailing the conference’s selectivity, the challenges of re‑identification, and how their deep‑learning approach achieves state‑of‑the‑art performance.

Attention MechanismCVPR 2019Deep Learning
0 likes · 5 min read
Multi‑Scale Body‑Part Masks Revolutionize Person Re‑Identification at CVPR 2019
Ctrip Technology
Ctrip Technology
May 21, 2019 · Artificial Intelligence

A Brief Overview of Machine Translation: History, Neural Models, and Practical Insights

This article surveys the evolution of machine translation from early rule‑based systems to modern neural architectures, explains how translation engines are trained, highlights recent advances such as attention and Transformers, and shares practical experience and current challenges in the field.

Attention MechanismNeural NetworksTransformer
0 likes · 11 min read
A Brief Overview of Machine Translation: History, Neural Models, and Practical Insights
DataFunTalk
DataFunTalk
May 20, 2019 · Artificial Intelligence

Evolution of Alibaba's Advertising CTR Prediction Models: From Linear Methods to Deep Interest Evolution Networks

The article reviews the characteristics of e‑commerce personalized prediction, outlines Alibaba's model iteration from large‑scale linear regression to deep learning architectures such as DIN, CrossMedia, and Deep Interest Evolution, and discusses future directions like disentangled representation and white‑box modeling.

Attention MechanismCTR predictionRecommendation Systems
0 likes · 11 min read
Evolution of Alibaba's Advertising CTR Prediction Models: From Linear Methods to Deep Interest Evolution Networks
Sohu Tech Products
Sohu Tech Products
Mar 13, 2019 · Artificial Intelligence

Attentive Group Recommendation (AGR): An Attention‑Based Deep Learning Model for Group Recommendation

This paper proposes AGR, the first group recommendation model that incorporates an attention mechanism to dynamically learn each member’s influence weight within a group, enabling flexible modeling of group decision processes and achieving superior performance over existing memory‑based, model‑based, and probabilistic baselines across four real‑world datasets.

Attention MechanismBPRcollaborative filtering
0 likes · 26 min read
Attentive Group Recommendation (AGR): An Attention‑Based Deep Learning Model for Group Recommendation
Tencent Cloud Developer
Tencent Cloud Developer
Sep 30, 2018 · Artificial Intelligence

Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress

The article surveys Tencent’s recent advances in smart‑speaker voice interaction, detailing a full technology chain—from front‑end capture, wake‑up and enhancement, through speaker verification and short‑speech voiceprint, to TDNN/LSTM speech recognition, target speaker extraction, and end‑to‑end attention modeling for robust, personalized performance.

Attention MechanismTTSmicrophone array
0 likes · 18 min read
Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 14, 2018 · Artificial Intelligence

AI RAP: End-to-End Speech Synthesis for Rap Generation Using Location‑Sensitive Attention and Inference Mask

AI RAP is an end‑to‑end AI service that lets users generate personalized rap with a single click by combining location‑sensitive attention and an inference mask to achieve perfect alignment, beat‑synchronous timing, multi‑character voice timbres, sub‑second synthesis, and a scalable architecture supporting millions of daily users.

AIAttention MechanismAudio Processing
0 likes · 5 min read
AI RAP: End-to-End Speech Synthesis for Rap Generation Using Location‑Sensitive Attention and Inference Mask
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 17, 2018 · Artificial Intelligence

Can Multi‑Task Learning Shorten E‑Commerce Titles Without Losing Sales?

This paper proposes a multi‑task learning approach that compresses overly long e‑commerce product titles into concise short titles using a Pointer Network, while simultaneously generating user search queries with an attention‑based encoder‑decoder, achieving higher readability, informativeness, and conversion rates than traditional methods.

Attention MechanismSequence-to-Sequencee-commerce SEO
0 likes · 11 min read
Can Multi‑Task Learning Shorten E‑Commerce Titles Without Losing Sales?
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 16, 2018 · Artificial Intelligence

How Syntax‑Sensitive Entity Representations Boost Neural Relation Extraction

This paper introduces a syntax‑aware entity representation using Tree‑GRU and attention mechanisms, demonstrating that enriching entity semantics with dependency tree information significantly improves neural relation extraction performance on the NYT dataset compared to existing distant supervision models.

Attention MechanismTree-GRUentity representation
0 likes · 7 min read
How Syntax‑Sensitive Entity Representations Boost Neural Relation Extraction
DataFunTalk
DataFunTalk
Aug 12, 2018 · Artificial Intelligence

Interpretability of Deep Learning and Low‑Frequency Event Learning in Financial Applications

The article reviews the limitations of mainstream deep‑learning models in finance, proposes hybrid tree‑based and Wide&Deep architectures combined with attention, sensitivity and variance analysis to improve interpretability and low‑frequency event detection, and validates the approach with a large‑scale insurance recommendation case study.

Attention MechanismWide&Deepfinance
0 likes · 17 min read
Interpretability of Deep Learning and Low‑Frequency Event Learning in Financial Applications
Meitu Technology
Meitu Technology
Jul 24, 2018 · Artificial Intelligence

Interaction-aware Spatio-Temporal Pyramid Attention Networks for Action Classification

Researchers introduce an Interaction‑aware Spatio‑Temporal Pyramid Attention network that embeds a PCA‑guided loss to capture complementary multi‑scale features, enabling end‑to‑end video action classification with state‑of‑the‑art accuracy on UCF101, HMDB51, Charades and internal datasets.

Attention MechanismCNNaction classification
0 likes · 7 min read
Interaction-aware Spatio-Temporal Pyramid Attention Networks for Action Classification
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 11, 2018 · Artificial Intelligence

Can Global Ranking Boost E‑Commerce GMV? A New AI Approach

Traditional e‑commerce ranking ignores interactions among displayed items, but this study introduces a novel global ranking method that models mutual influences, optimizes expected GMV using extended global features and RNN‑based sequence generation, achieving a 5% GMV lift in large‑scale A/B tests.

Attention MechanismGMVRNN
0 likes · 12 min read
Can Global Ranking Boost E‑Commerce GMV? A New AI Approach
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 15, 2018 · Artificial Intelligence

How Deep Learning Transforms Knowledge Graph Relation Extraction

This article reviews the evolution from rule‑based DeepDive methods to deep‑learning approaches such as PCNNs and attention‑enhanced models for relation extraction, presents experimental results on the NYT dataset, discusses practical challenges in large‑scale deployment, and outlines future research directions.

Attention MechanismDeep LearningKnowledge Graph
0 likes · 14 min read
How Deep Learning Transforms Knowledge Graph Relation Extraction
Hulu Beijing
Hulu Beijing
Dec 20, 2017 · Artificial Intelligence

How Attention Mechanisms Transform Seq2Seq Models for Better Translation

This article explains why attention mechanisms were introduced into Seq2Seq models, how they address the limitations of fixed‑length encoding, the role of bidirectional RNNs, and showcases their impact on machine translation and image captioning with illustrative diagrams.

Attention MechanismRNNSeq2Seq
0 likes · 10 min read
How Attention Mechanisms Transform Seq2Seq Models for Better Translation
Qunar Tech Salon
Qunar Tech Salon
Nov 29, 2015 · Artificial Intelligence

From Symbolic Semantics to Vector Representations: Deep Learning for Natural Language Understanding

The article reviews symbolic knowledge bases such as WordNet, ConceptNet and FrameNet, explains how deep learning replaces them with vector‑based semantic representations, and discusses encoder‑decoder RNNs, attention mechanisms, and future directions for truly understanding language through experiential learning.

Attention MechanismDeep LearningRNN
0 likes · 12 min read
From Symbolic Semantics to Vector Representations: Deep Learning for Natural Language Understanding