Tag

Self-Attention

0 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

EmbeddingPositional EncodingPyTorch
0 likes · 26 min read
Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide
Cognitive Technology Team
Cognitive Technology Team
Mar 10, 2025 · Artificial Intelligence

Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms

This article explains the evolution of natural language processing, the limitations of rule‑based, statistical, and recurrent neural network models, and then introduces the Transformer architecture—covering word and position embeddings, self‑attention, multi‑head attention, Add & Norm, feed‑forward layers, and encoder‑decoder design—to help beginners grasp why Transformers solve key NLP problems.

AINLPSelf-Attention
0 likes · 15 min read
Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms
JD Tech Talk
JD Tech Talk
Jun 25, 2024 · Artificial Intelligence

Understanding Large Language Models: From Parameters to Transformer Architecture

This article explains the fundamental concepts behind large language models, including their two-file structure, training process, neural network basics, perceptron examples, weight and threshold calculations, the TensorFlow Playground, and a detailed walkthrough of the Transformer architecture with tokenization, positional encoding, self‑attention, normalization, and feed‑forward layers.

AILarge Language ModelsSelf-Attention
0 likes · 20 min read
Understanding Large Language Models: From Parameters to Transformer Architecture
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 12, 2024 · Artificial Intelligence

A Simple Introduction to the Transformer Model

This article provides a comprehensive, beginner-friendly explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, decoding process, final linear and softmax layers, and training considerations, illustrated with numerous diagrams and code snippets.

Self-AttentionTransformerdeep learning
0 likes · 24 min read
A Simple Introduction to the Transformer Model
JD Tech
JD Tech
Jun 7, 2024 · Artificial Intelligence

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

This article explains the fundamentals of attention mechanisms, including biological inspiration, the evolution from early visual attention to modern self‑attention in Transformers, details the scaled dot‑product calculations, positional encoding, and multi‑head attention, illustrating how these concepts enable efficient parallel processing of sequence data.

AIAttentionPositional Encoding
0 likes · 12 min read
Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers
Sohu Tech Products
Sohu Tech Products
Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

BERTEncoder-DecoderNLP
0 likes · 13 min read
Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview
IT Services Circle
IT Services Circle
Mar 2, 2023 · Artificial Intelligence

Understanding GPT: Word Vectors, Transformers, and Model Architectures (GPT‑2, GPT‑3)

This article provides a concise technical overview of GPT, explaining how word vectors are constructed, how the Transformer architecture with self‑attention and feed‑forward layers processes these vectors, and how GPT‑2 and GPT‑3 extend the model with decoder‑only and large‑scale designs.

AIGPTSelf-Attention
0 likes · 8 min read
Understanding GPT: Word Vectors, Transformers, and Model Architectures (GPT‑2, GPT‑3)
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 10, 2022 · Artificial Intelligence

A Beginner’s Journey into Vision Transformers (ViT) for Computer Vision Engineers

This article introduces the fundamentals of Vision Transformers (ViT) for computer‑vision developers, starting with an overview of the transformer architecture, detailed explanation of self‑attention and multi‑head attention, and step‑by‑step PyTorch code examples that illustrate query, key, value computation and attention scoring.

PyTorchSelf-AttentionTransformer
0 likes · 12 min read
A Beginner’s Journey into Vision Transformers (ViT) for Computer Vision Engineers
vivo Internet Technology
vivo Internet Technology
Aug 24, 2022 · Frontend Development

Applying Self-Attention Based Machine Learning Model to Design-to-Code Layout Prediction

Vivo’s frontend team built a self‑attention‑based machine‑learning model that predicts web‑page layout types (column, row, or absolute) from node dimensions and positions, solving parent‑child and sibling relationships for design‑to‑code conversion, achieving 99.4% accuracy using over 20 k labeled, crawled, and generated samples, while outlining further enhancements.

D2CSelf-AttentionVivo
0 likes · 11 min read
Applying Self-Attention Based Machine Learning Model to Design-to-Code Layout Prediction
Baidu Geek Talk
Baidu Geek Talk
Mar 28, 2022 · Artificial Intelligence

Robust Input Visualization Methods for Vision Transformers

The paper proposes a robust Grad‑CAM‑inspired visualization for Vision Transformers that combines attention weights and gradients to generate class‑specific saliency maps, demonstrates superior alignment with discriminative regions across ViT, Swin and Volo models, and shows a 76% false‑positive reduction in Baidu’s porn‑content risk control system.

Grad-CAMInput VisualizationSelf-Attention
0 likes · 11 min read
Robust Input Visualization Methods for Vision Transformers
AntTech
AntTech
Oct 29, 2021 · Artificial Intelligence

Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)

The Ant Insurance Technology team, together with the Institute of Automation of the Chinese Academy of Sciences, secured first place in both the MuSe‑Wilder and MuSe‑Sent tracks of the MuSe2021 Multimodal Sentiment Challenge held at the 29th ACM International Conference on Multimedia in Chengdu, showcasing advanced multimodal AI techniques.

BiLSTMMuSe2021Self-Attention
0 likes · 4 min read
Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)
Cyber Elephant Tech Team
Cyber Elephant Tech Team
Apr 28, 2021 · Artificial Intelligence

Understanding BERT: From Encoder-Decoder to Transformer and Attention

This article explains the BERT model by first reviewing the Encoder-Decoder framework, then detailing the attention mechanism—including self-attention and multi-head attention—before describing the full Transformer architecture and finally outlining BERT’s encoder-only design, training stages, and fine-tuning applications.

AttentionBERTEncoder-Decoder
0 likes · 15 min read
Understanding BERT: From Encoder-Decoder to Transformer and Attention
Sohu Tech Products
Sohu Tech Products
Nov 25, 2020 · Artificial Intelligence

Illustrated Guide to GPT-2: Detailed Explanation of the Decoder‑Only Transformer Model

This article provides a comprehensive, illustrated walkthrough of OpenAI's GPT‑2 language model, covering its decoder‑only Transformer architecture, self‑attention mechanisms, token processing, training data, differences from BERT, and applications beyond language modeling, enriched with visual diagrams and code snippets for deeper understanding.

AIGPT-2Self-Attention
0 likes · 24 min read
Illustrated Guide to GPT-2: Detailed Explanation of the Decoder‑Only Transformer Model
Sohu Tech Products
Sohu Tech Products
Nov 11, 2020 · Artificial Intelligence

Illustrated Transformer: Comprehensive Explanation and Code Implementation

This article provides a step‑by‑step illustrated guide to the Transformer architecture, covering its macro structure, detailed self‑attention mechanisms, multi‑head attention, positional encoding, residual connections, decoder operation, training process, loss functions, and includes complete PyTorch and custom Python code examples.

Multi-Head AttentionNLPPyTorch
0 likes · 33 min read
Illustrated Transformer: Comprehensive Explanation and Code Implementation
DataFunTalk
DataFunTalk
Oct 23, 2020 · Artificial Intelligence

Feedback‑Aware Deep Matching Model for Music Recommendation in Tmall Genie

This article presents DeepMatch, a behavior‑sequence based deep learning recall model enhanced with play‑rate and intent‑type embeddings, describes its self‑attention architecture, factorized embedding parameterization, multitask loss design, distributed TensorFlow training tricks, and demonstrates significant offline and online improvements in music recommendation performance.

Self-AttentionTensorFlowdeep learning
0 likes · 15 min read
Feedback‑Aware Deep Matching Model for Music Recommendation in Tmall Genie
Qunar Tech Salon
Qunar Tech Salon
Sep 12, 2019 · Artificial Intelligence

A Comprehensive Overview of Attention Mechanisms in Deep Learning

This article systematically reviews the history, core concepts, variants, and practical implementations of attention mechanisms—from early additive and multiplicative forms to self‑attention, multi‑head attention, and recent transformer‑based models—highlighting why attention has become fundamental in modern AI research.

AttentionNLPSelf-Attention
0 likes · 16 min read
A Comprehensive Overview of Attention Mechanisms in Deep Learning
Sohu Tech Products
Sohu Tech Products
Jan 9, 2019 · Artificial Intelligence

Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms

This article provides a comprehensive, step‑by‑step explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, and training processes, illustrated with diagrams and code snippets to aid readers new to neural machine translation.

Multi-Head AttentionPositional EncodingSelf-Attention
0 likes · 16 min read
Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms