Tag

attention mechanism

0 views collected around this technical thread.

Architecture Digest
Architecture Digest
Feb 24, 2025 · Artificial Intelligence

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

The article introduces MoBA, a Mixture‑of‑Block‑Attention mechanism that applies Mixture‑of‑Experts principles to transformer attention, enabling efficient long‑context processing for large language models while maintaining performance comparable to full attention through sparse, trainable block selection and seamless switching.

LLMMixture of ExpertsMoBA
0 likes · 12 min read
MoBA: Mixture of Block Attention for Long‑Context Large Language Models
DaTaobao Tech
DaTaobao Tech
Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AINLPPyTorch
0 likes · 44 min read
Understanding Neural Networks and Transformers: Principles, Implementation, and Applications
Architect's Guide
Architect's Guide
May 13, 2024 · Artificial Intelligence

Understanding the Core Principles of Transformer Architecture

This article explains how Transformer models work by detailing the encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, and feed‑forward networks, and shows their applications in machine translation, recommendation systems, and large language models.

AINatural Language ProcessingTransformer
0 likes · 11 min read
Understanding the Core Principles of Transformer Architecture
Top Architect
Top Architect
Apr 18, 2024 · Artificial Intelligence

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

This article provides a comprehensive overview of Transformer models, covering their attention-based architecture, encoder-decoder structure, training procedures including teacher forcing, inference workflow, advantages over RNNs, and various applications in natural language processing such as translation, summarization, and classification.

InferenceNLPTransformer
0 likes · 11 min read
Understanding Transformers: Architecture, Attention Mechanism, Training and Inference
JD Tech
JD Tech
Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

ChatGPTEmergenceattention mechanism
0 likes · 27 min read
Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room
JD Retail Technology
JD Retail Technology
Oct 9, 2023 · Artificial Intelligence

Does ChatGPT Possess Theory of Mind? An Exploration of Attention Mechanisms, Emergence, and Compression in Large Language Models

Recent research suggests GPT‑3 exhibits Theory of Mind abilities, prompting a deep dive into attention mechanisms, neural network fundamentals, emergent capabilities, and the role of compression in large language models, while examining philosophical thought experiments like the Chinese Room to question true machine intelligence.

ChatGPTEmergenceTheory of Mind
0 likes · 26 min read
Does ChatGPT Possess Theory of Mind? An Exploration of Attention Mechanisms, Emergence, and Compression in Large Language Models
Kuaishou Tech
Kuaishou Tech
Aug 8, 2023 · Artificial Intelligence

TWIN: Two-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction

This paper presents TWIN, a two-stage interest network that aligns the similarity metrics of coarse‑grained and fine‑grained modules to improve lifelong user behavior modeling for CTR prediction in large‑scale online recommendation systems.

CTR predictionKuaishouTWIN
0 likes · 10 min read
TWIN: Two-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction
Sohu Tech Products
Sohu Tech Products
Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

BERTEncoder-DecoderNLP
0 likes · 13 min read
Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview
DataFunSummit
DataFunSummit
Feb 19, 2023 · Artificial Intelligence

Understanding In-Context Learning in Large Language Models: Experiments, Analysis, and Theoretical Insights

This article explains the concept of in‑context learning in large language models, presents experimental evaluations such as copy‑output, date‑formatting, and label‑remapping tasks, and discusses a recent theoretical analysis that links attention layers to implicit gradient‑based fine‑tuning, highlighting why model scale and data volume matter.

GPT-3In-Context Learningattention mechanism
0 likes · 15 min read
Understanding In-Context Learning in Large Language Models: Experiments, Analysis, and Theoretical Insights
IT Architects Alliance
IT Architects Alliance
Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”

This article provides a comprehensive, plain‑language walkthrough of the 2017 “Attention Is All You Need” paper, explaining the Transformer’s architecture, core mechanisms such as embedding, positional encoding and self‑attention, and discussing its broader impact on AI research and applications.

AINatural Language ProcessingTransformer
0 likes · 17 min read
Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”
HomeTech
HomeTech
Nov 16, 2022 · Artificial Intelligence

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.

AIactor-criticattention mechanism
0 likes · 22 min read
Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition
DataFunSummit
DataFunSummit
Nov 14, 2022 · Artificial Intelligence

Machine Learning Methods for Solving Combinatorial Optimization Problems

This article reviews recent advances in applying machine learning—especially attention mechanisms, graph neural networks, and reinforcement learning—to combinatorial optimization, outlines fundamental problem definitions, classic algorithms, modern ML‑based approaches, experimental results, and future research directions.

AlgorithmsGraph Neural Networksattention mechanism
0 likes · 18 min read
Machine Learning Methods for Solving Combinatorial Optimization Problems
DataFunTalk
DataFunTalk
Oct 24, 2022 · Artificial Intelligence

Efficient Target Attention (ETA) for Long-Term User Behavior Modeling in Click‑Through Rate Prediction

Efficient Target Attention (ETA) introduces a low‑cost hash‑based attention operator that enables end‑to‑end modeling of ultra‑long user behavior sequences for CTR prediction, achieving significant online CTR, GMV, and QPS improvements in Alibaba’s Taobao feed recommendation system.

CTR predictionHashingLong sequence modeling
0 likes · 20 min read
Efficient Target Attention (ETA) for Long-Term User Behavior Modeling in Click‑Through Rate Prediction
DataFunSummit
DataFunSummit
Feb 26, 2022 · Artificial Intelligence

Graph-Based Sparse Behavior Recall Models for Content Recommendation

This article presents a comprehensive study of graph‑based recall techniques for content recommendation, detailing how knowledge‑graph‑augmented user‑behavior graphs and novel attention‑driven models such as GADM, SGGA, and SGGGA improve performance for users with sparse interaction histories.

Graph Neural NetworksRecommendation systemsattention mechanism
0 likes · 11 min read
Graph-Based Sparse Behavior Recall Models for Content Recommendation
DataFunTalk
DataFunTalk
Jan 17, 2022 · Artificial Intelligence

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

This talk introduces the motivation, design, theoretical analysis, and extensive experimental results of Tencent Angel Graph's Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS), which address GNN scalability and flexibility by using node‑wise adaptive propagation, attention‑based feature fusion, and a lightweight training pipeline.

GAMLPGraph Neural NetworksNDLS
0 likes · 18 min read
Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks
DataFunTalk
DataFunTalk
May 22, 2021 · Artificial Intelligence

Baidu's Video Foundation Technology Architecture and Key AI Techniques

This article presents an overview of Baidu's video foundation technology architecture, covering the video R&D platform, core AI techniques for video understanding, editing, surveillance, and general vision, and detailing innovations such as Attention‑Cluster networks, cross‑modality attention with graph convolution, GANs, super‑resolution, and adaptive encoding.

Adaptive EncodingGANSuper-Resolution
0 likes · 14 min read
Baidu's Video Foundation Technology Architecture and Key AI Techniques
New Oriental Technology
New Oriental Technology
Jan 25, 2021 · Artificial Intelligence

Transformer Model: Attention Mechanism in Machine Translation

The 2017 Transformer model introduced by Vaswani et al. revolutionized machine translation by relying solely on attention mechanisms, outperforming traditional RNN and CNN approaches through parallel processing and improved contextual understanding.

AINLPTransformer
0 likes · 4 min read
Transformer Model: Attention Mechanism in Machine Translation
58 Tech
58 Tech
Nov 11, 2020 · Artificial Intelligence

Deep Learning for Click‑Through Rate Prediction in 58.com Home‑Page Recommendation

This article details how 58.com leverages deep learning models such as DNN, Wide&Deep, DeepFM, DIN and DIEN, combined with extensive user‑behavior feature engineering, offline vectorization, and online TensorFlow‑Serving pipelines to improve home‑page recommendation click‑through rates and overall platform efficiency.

A/B testingCTR predictionattention mechanism
0 likes · 25 min read
Deep Learning for Click‑Through Rate Prediction in 58.com Home‑Page Recommendation
DataFunTalk
DataFunTalk
Oct 17, 2020 · Artificial Intelligence

DyHAN: Dynamic Heterogeneous Graph Embedding with Hierarchical Attention

This article introduces DyHAN, a dynamic heterogeneous graph embedding method that employs hierarchical attention across node, edge, and temporal dimensions to capture evolving user-item interactions, demonstrates superior performance over static and existing dynamic baselines, and reports significant online improvements in Alibaba’s recommendation system.

Alibabaattention mechanismdynamic graphs
0 likes · 9 min read
DyHAN: Dynamic Heterogeneous Graph Embedding with Hierarchical Attention
Kuaishou Large Model
Kuaishou Large Model
Oct 15, 2020 · Artificial Intelligence

How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR

This article reviews Kuashou Y‑Tech’s ECCV‑2020 paper on monocular depth estimation, detailing its novel GCB‑SAB network, new HC‑Depth dataset, specialized loss functions and edge‑aware training, and demonstrates superior performance on NYUv2, TUM and real‑world mobile AR applications.

Computer Visionattention mechanismdeep learning
0 likes · 14 min read
How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR