Tagged articles
17 articles
Page 1 of 1
AI Cyberspace
AI Cyberspace
Feb 14, 2026 · Artificial Intelligence

Unpacking the Transformer: From Embeddings to Multi‑Head Attention

This article provides a comprehensive, step‑by‑step walkthrough of the Transformer architecture, covering input embedding, positional encoding, the mechanics of Q‑K‑V attention, scaled dot‑product formulas, multi‑head and masked attention, feed‑forward networks, residual connections, layer normalization, decoder generation, and recent attention‑optimization techniques.

Deep LearningFeed-Forward NetworkPositional Encoding
0 likes · 39 min read
Unpacking the Transformer: From Embeddings to Multi‑Head Attention
AI Architecture Hub
AI Architecture Hub
Jan 19, 2026 · Artificial Intelligence

Demystifying the Transformer: From Input Embedding to Multi‑Head Attention

This article breaks down the core components of the Transformer architecture—including input embedding, positional encoding, multi‑head self‑attention, residual connections with layer normalization, position‑wise feed‑forward networks, and the rationale behind stacking multiple encoder layers—using clear explanations and illustrative diagrams.

Add&NormDeep LearningFeed Forward
0 likes · 12 min read
Demystifying the Transformer: From Input Embedding to Multi‑Head Attention
Data Party THU
Data Party THU
Jan 18, 2026 · Artificial Intelligence

Unlocking 3D Scene Synthesis: A Deep Dive into Neural Radiance Fields (NeRF)

This article explains the core principles of Neural Radiance Fields, detailing how a fully‑connected network maps 5‑D coordinates to color and density, the role of positional encoding and hierarchical sampling, and provides a complete PyTorch implementation with training and rendering examples.

3D Scene RepresentationHierarchical SamplingNeRF
0 likes · 18 min read
Unlocking 3D Scene Synthesis: A Deep Dive into Neural Radiance Fields (NeRF)
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Deep LearningNeural NetworksPositional Encoding
0 likes · 22 min read
Unveiling Transformer Internals: From Theory to PyTorch Code
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Sep 26, 2025 · Artificial Intelligence

Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN

Preparing for large-model interview? This guide reveals why interviewers probe seemingly minor components—positional encoding, residual connections, layer normalization, and feed-forward networks—explains each technique's purpose, variants, and how to answer confidently, plus practical tips and a learning roadmap to boost your chances.

FFNInterview TipsLayerNorm
0 likes · 8 min read
Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN
Cognitive Technology Team
Cognitive Technology Team
Jun 29, 2025 · Artificial Intelligence

Understanding Transformers: Core Mechanics Behind Modern AI Models

This article demystifies the Transformer architecture for beginners, explaining its relationship to large models, the self‑attention and multi‑head attention mechanisms, positional encoding, and the roles of Encoder and Decoder components, using clear analogies and visual diagrams to aid comprehension.

Deep LearningEncoder-DecoderPositional Encoding
0 likes · 20 min read
Understanding Transformers: Core Mechanics Behind Modern AI Models
Tencent Technical Engineering
Tencent Technical Engineering
Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

Positional EncodingPyTorchSelf-Attention
0 likes · 26 min read
Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide
AntTech
AntTech
Mar 4, 2025 · Artificial Intelligence

GraphCLIP and 2D‑TPE: Enhancing Transferability of Graph Models and Table Understanding for Large Language Models

This article introduces GraphCLIP, a self‑supervised graph‑summary pre‑training framework that boosts zero‑ and few‑shot transferability of graph foundation models for text‑attributed graphs, and 2D‑TPE, a two‑dimensional positional encoding method that preserves table structure to markedly improve large language model performance on table‑understanding tasks, while also announcing a live paper session at WWW 2025 featuring the authors.

Positional EncodingSelf‑Supervised LearningTable Understanding
0 likes · 6 min read
GraphCLIP and 2D‑TPE: Enhancing Transferability of Graph Models and Table Understanding for Large Language Models
JD Tech
JD Tech
Jun 7, 2024 · Artificial Intelligence

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

This article explains the fundamentals of attention mechanisms, including biological inspiration, the evolution from early visual attention to modern self‑attention in Transformers, details the scaled dot‑product calculations, positional encoding, and multi‑head attention, illustrating how these concepts enable efficient parallel processing of sequence data.

AIPositional EncodingSelf-Attention
0 likes · 12 min read
Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers
Architect
Architect
Mar 19, 2024 · Artificial Intelligence

How Transformers Power Modern NLP: A Deep Dive into Encoder‑Decoder Mechanics

This article explains the core principles of Transformer models—covering input embeddings, self‑attention, multi‑head attention, positional encoding, feed‑forward networks, and decoder strategies—using concrete examples like "The cat sat on the mat" and "The quick brown fox jumps over the lazy dog" to illustrate each step.

Encoder-DecoderFeed-Forward NetworkNLP
0 likes · 13 min read
How Transformers Power Modern NLP: A Deep Dive into Encoder‑Decoder Mechanics
DaTaobao Tech
DaTaobao Tech
Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMLLM architecture
0 likes · 32 min read
Large Language Model Upgrade Paths and Architecture Selection
Nightwalker Tech
Nightwalker Tech
Jul 18, 2023 · Artificial Intelligence

Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding

This article explains how to build the input processing stage of a Transformer—including tokenization with Hugging Face tokenizers, token‑to‑embedding conversion using BERT models, custom BPE tokenizers, and positional encoding—providing complete Python code examples and test results.

BPEEmbeddingPositional Encoding
0 likes · 14 min read
Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding
Code DAO
Code DAO
May 3, 2022 · Artificial Intelligence

How to Build Your Own NeRF Model in PyTorch – Step‑by‑Step Guide

This tutorial walks through the theory and implementation of Neural Radiance Fields (NeRF) in PyTorch, covering positional encoding, the MLP architecture, differentiable volume rendering, hierarchical sampling, training tricks, and references to the original research.

Hierarchical SamplingNeRFNeural Radiance Fields
0 likes · 23 min read
How to Build Your Own NeRF Model in PyTorch – Step‑by‑Step Guide
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 15, 2021 · Artificial Intelligence

Why Can BERT’s Token, Segment, and Position Embeddings Be Added? A Deep Dive into Positional Encoding

This article revisits the long‑standing question of why BERT’s token, segment, and position embeddings are summed, critiques earlier explanations, and presents findings from the ICLR‑2021 paper “Rethinking Positional Encoding in Language Pre‑training” that show removing the token‑position cross term speeds convergence and improves downstream GLUE scores.

BERTEmbeddingLanguage Pretraining
0 likes · 6 min read
Why Can BERT’s Token, Segment, and Position Embeddings Be Added? A Deep Dive into Positional Encoding
Sohu Tech Products
Sohu Tech Products
Jan 9, 2019 · Artificial Intelligence

Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms

This article provides a comprehensive, step‑by‑step explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, and training processes, illustrated with diagrams and code snippets to aid readers new to neural machine translation.

Deep LearningNeural Machine TranslationPositional Encoding
0 likes · 16 min read
Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms