Tagged articles

Positional Encoding

18 articles · Page 1 of 1

Jun 3, 2026 · Artificial Intelligence

A Deliberate Paradigm Shift: How “Attention Is All You Need” Reshaped Deep Learning

The article dissects how the 2017 "Attention Is All You Need" paper sparked a fundamental redesign of sequence modeling by replacing recurrent and convolutional approaches with self‑attention, detailing its mathematical foundations, architectural components, training tricks, limitations, and emerging alternatives such as Mamba.

Attention MechanismMambaMulti-Head Attention

0 likes · 24 min read

A Deliberate Paradigm Shift: How “Attention Is All You Need” Reshaped Deep Learning

AI Cyberspace

Feb 14, 2026 · Artificial Intelligence

Unpacking the Transformer: From Embeddings to Multi‑Head Attention

This article provides a comprehensive, step‑by‑step walkthrough of the Transformer architecture, covering input embedding, positional encoding, the mechanics of Q‑K‑V attention, scaled dot‑product formulas, multi‑head and masked attention, feed‑forward networks, residual connections, layer normalization, decoder generation, and recent attention‑optimization techniques.

Feed-Forward NetworkMulti-Head AttentionPositional Encoding

0 likes · 39 min read

Unpacking the Transformer: From Embeddings to Multi‑Head Attention

AI Architecture Hub

Jan 19, 2026 · Artificial Intelligence

Demystifying the Transformer: From Input Embedding to Multi‑Head Attention

This article breaks down the core components of the Transformer architecture—including input embedding, positional encoding, multi‑head self‑attention, residual connections with layer normalization, position‑wise feed‑forward networks, and the rationale behind stacking multiple encoder layers—using clear explanations and illustrative diagrams.

Add&NormFeed ForwardInput Embedding

0 likes · 12 min read

Demystifying the Transformer: From Input Embedding to Multi‑Head Attention

Data Party THU

Jan 18, 2026 · Artificial Intelligence

Unlocking 3D Scene Synthesis: A Deep Dive into Neural Radiance Fields (NeRF)

This article explains the core principles of Neural Radiance Fields, detailing how a fully‑connected network maps 5‑D coordinates to color and density, the role of positional encoding and hierarchical sampling, and provides a complete PyTorch implementation with training and rendering examples.

3D Scene RepresentationHierarchical SamplingNeRF

0 likes · 18 min read

Unlocking 3D Scene Synthesis: A Deep Dive into Neural Radiance Fields (NeRF)

Data Party THU

Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Positional EncodingPyTorchTransformer

0 likes · 22 min read

Unveiling Transformer Internals: From Theory to PyTorch Code

Wu Shixiong's Large Model Academy

Sep 26, 2025 · Artificial Intelligence

Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN

Preparing for large-model interview? This guide reveals why interviewers probe seemingly minor components—positional encoding, residual connections, layer normalization, and feed-forward networks—explains each technique's purpose, variants, and how to answer confidently, plus practical tips and a learning roadmap to boost your chances.

FFNInterview TipsLayerNorm

0 likes · 8 min read

Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN

Cognitive Technology Team

Jun 29, 2025 · Artificial Intelligence

Understanding Transformers: Core Mechanics Behind Modern AI Models

This article demystifies the Transformer architecture for beginners, explaining its relationship to large models, the self‑attention and multi‑head attention mechanisms, positional encoding, and the roles of Encoder and Decoder components, using clear analogies and visual diagrams to aid comprehension.

Encoder-DecoderMulti-Head AttentionPositional Encoding

0 likes · 20 min read

Understanding Transformers: Core Mechanics Behind Modern AI Models

Tencent Technical Engineering

Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

Machine TranslationPositional EncodingPyTorch

0 likes · 26 min read

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

AntTech

Mar 4, 2025 · Artificial Intelligence

GraphCLIP and 2D‑TPE: Enhancing Transferability of Graph Models and Table Understanding for Large Language Models

This article introduces GraphCLIP, a self‑supervised graph‑summary pre‑training framework that boosts zero‑ and few‑shot transferability of graph foundation models for text‑attributed graphs, and 2D‑TPE, a two‑dimensional positional encoding method that preserves table structure to markedly improve large language model performance on table‑understanding tasks, while also announcing a live paper session at WWW 2025 featuring the authors.

Graph Neural NetworksPositional EncodingSelf‑Supervised Learning

0 likes · 6 min read

GraphCLIP and 2D‑TPE: Enhancing Transferability of Graph Models and Table Understanding for Large Language Models

NewBeeNLP

Nov 27, 2024 · Artificial Intelligence

How Can Large Language Models Extend Their Context Window? A Deep Dive into Position Encoding

This article reviews the principles of absolute and relative positional encodings, explains why window extrapolation is crucial for large language models, analyzes current extrapolation methods, evaluates their performance, and answers common questions about extending LLM context windows.

LLMPositional EncodingRoPE

0 likes · 14 min read

How Can Large Language Models Extend Their Context Window? A Deep Dive into Position Encoding

JD Tech

Jun 7, 2024 · Artificial Intelligence

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

This article explains the fundamentals of attention mechanisms, including biological inspiration, the evolution from early visual attention to modern self‑attention in Transformers, details the scaled dot‑product calculations, positional encoding, and multi‑head attention, illustrating how these concepts enable efficient parallel processing of sequence data.

AIPositional EncodingSelf-Attention

0 likes · 12 min read

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

Architect

Mar 19, 2024 · Artificial Intelligence

How Transformers Power Modern NLP: A Deep Dive into Encoder‑Decoder Mechanics

This article explains the core principles of Transformer models—covering input embeddings, self‑attention, multi‑head attention, positional encoding, feed‑forward networks, and decoder strategies—using concrete examples like "The cat sat on the mat" and "The quick brown fox jumps over the lazy dog" to illustrate each step.

Encoder-DecoderFeed-Forward NetworkMulti-Head Attention

0 likes · 13 min read

How Transformers Power Modern NLP: A Deep Dive into Encoder‑Decoder Mechanics

DaTaobao Tech

Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMData preprocessing

0 likes · 32 min read

Large Language Model Upgrade Paths and Architecture Selection

Nightwalker Tech

Jul 18, 2023 · Artificial Intelligence

Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding

This article explains how to build the input processing stage of a Transformer—including tokenization with Hugging Face tokenizers, token‑to‑embedding conversion using BERT models, custom BPE tokenizers, and positional encoding—providing complete Python code examples and test results.

BPEEmbeddingPositional Encoding

0 likes · 14 min read

Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding

Code DAO

May 3, 2022 · Artificial Intelligence

How to Build Your Own NeRF Model in PyTorch – Step‑by‑Step Guide

This tutorial walks through the theory and implementation of Neural Radiance Fields (NeRF) in PyTorch, covering positional encoding, the MLP architecture, differentiable volume rendering, hierarchical sampling, training tricks, and references to the original research.

Hierarchical SamplingNeRFNeural Radiance Fields

0 likes · 23 min read

How to Build Your Own NeRF Model in PyTorch – Step‑by‑Step Guide

Baobao Algorithm Notes

Dec 15, 2021 · Artificial Intelligence

Why Can BERT’s Token, Segment, and Position Embeddings Be Added? A Deep Dive into Positional Encoding

This article revisits the long‑standing question of why BERT’s token, segment, and position embeddings are summed, critiques earlier explanations, and presents findings from the ICLR‑2021 paper “Rethinking Positional Encoding in Language Pre‑training” that show removing the token‑position cross term speeds convergence and improves downstream GLUE scores.

BERTEmbeddingLanguage Pretraining

0 likes · 6 min read

Why Can BERT’s Token, Segment, and Position Embeddings Be Added? A Deep Dive into Positional Encoding

TiPaiPai Technical Team

May 31, 2021 · Artificial Intelligence

Understanding Transformers: Self‑Attention, Multi‑Head Mechanisms, and Positional Encoding

This article explains the Transformer architecture—its self‑attention core, multi‑head attention, positional encoding, encoder‑decoder structure, and how it overcomes RNN limitations, providing a foundation for its use in NLP, image detection, and OCR.

Multi-Head AttentionNLPPositional Encoding

0 likes · 7 min read

Understanding Transformers: Self‑Attention, Multi‑Head Mechanisms, and Positional Encoding

Sohu Tech Products

Jan 9, 2019 · Artificial Intelligence

Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms

This article provides a comprehensive, step‑by‑step explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, and training processes, illustrated with diagrams and code snippets to aid readers new to neural machine translation.

Multi-Head AttentionNeural Machine TranslationPositional Encoding

0 likes · 16 min read

Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms