Tagged articles
78 articles
Page 1 of 1
Geek Labs
Geek Labs
May 6, 2026 · Artificial Intelligence

Build a GPT from Scratch and Decode AI Coding Jargon with Two Top GitHub Projects

The article introduces two practical GitHub repositories—how-to-train-your-gpt, a step‑by‑step guide that builds a LLaMA‑style GPT model across 12 chapters, and dictionary-of-ai-coding, a plain‑language glossary of AI‑coding terms—showing how they together provide a complete understanding of modern LLM fundamentals and terminology.

AIGPTGitHub
0 likes · 9 min read
Build a GPT from Scratch and Decode AI Coding Jargon with Two Top GitHub Projects
AI Tech Publishing
AI Tech Publishing
Apr 29, 2026 · Artificial Intelligence

Why Do AI Agents Forget and Hallucinate? A Complete Guide to KV‑Cache Memory Mechanisms

The article explains that AI agents’ forgetting and hallucinations stem from token‑level attention scores causing key‑value cache eviction before retrieval, then surveys KV‑cache basics, naive growth, streaming‑LLM windowing, SnapKV’s attention‑guided compression, token‑retention studies, Memory Sparse Attention, compares these methods, and discusses practical system pitfalls and design implications.

AI agentsKV cacheMemory Sparse Attention
0 likes · 20 min read
Why Do AI Agents Forget and Hallucinate? A Complete Guide to KV‑Cache Memory Mechanisms
Geek Labs
Geek Labs
Apr 20, 2026 · Artificial Intelligence

A Complete Open‑Source Guide to LLM Internals: From Tokenization to Inference Optimization

This open‑source tutorial breaks down large language model internals into 11 detailed topics—covering BPE tokenization, attention mathematics, backpropagation, transformer architecture, KV‑Cache, Paged and Flash Attention, and frontier techniques—each with numeric derivations and Python code, making it ideal for developers and interview preparation.

Flash AttentionInference OptimizationKV cache
0 likes · 5 min read
A Complete Open‑Source Guide to LLM Internals: From Tokenization to Inference Optimization
AI Tech Publishing
AI Tech Publishing
Apr 9, 2026 · Artificial Intelligence

Engineering‑Focused Guide to Training and Inference of Large Language Models

This article walks engineers through the full LLM stack—from tokenization and positional encoding to transformer blocks, efficient fine‑tuning, quantization, and production‑grade inference techniques such as KV‑cache, FlashAttention, PagedAttention, continuous batching, and speculative decoding—highlighting trade‑offs, toolchains, and practical workflow steps.

Fine-tuningInferenceLLM
0 likes · 13 min read
Engineering‑Focused Guide to Training and Inference of Large Language Models
AI Tech Publishing
AI Tech Publishing
Apr 5, 2026 · Artificial Intelligence

Why the First Token Is Slow: A Deep Dive into KV Cache for LLM Inference

The article explains how KV cache eliminates redundant computations in autoregressive LLM generation, detailing the attention mechanism, the O(n²) waste of recomputing K and V, the cache‑based solution, its impact on time‑to‑first‑token, and the memory‑vs‑speed trade‑off.

Inference OptimizationKV cacheLLM
0 likes · 7 min read
Why the First Token Is Slow: A Deep Dive into KV Cache for LLM Inference
SuanNi
SuanNi
Mar 29, 2026 · Artificial Intelligence

How an AI Agent Outperformed NVIDIA Engineers in 7‑Day GPU Kernel Optimization

This article analyzes the AVO system, an autonomous AI agent that replaces traditional evolutionary search pipelines to iteratively improve CUDA attention kernels on NVIDIA's Blackwell B200 GPU, achieving up to 10.5% higher throughput than hand‑tuned implementations after a week of nonstop optimization.

AICUDAGPU Optimization
0 likes · 13 min read
How an AI Agent Outperformed NVIDIA Engineers in 7‑Day GPU Kernel Optimization
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 20, 2026 · Artificial Intelligence

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

This article explains how Attention Residuals (AttnRes) replace traditional residual shortcuts with layer‑wise attention, details the mathematical reformulation, design constraints, static‑Q trick, full and block variants, and presents experimental evidence of significant accuracy gains with modest overhead.

NLPNeural NetworksRMSNorm
0 likes · 11 min read
Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals
PaperAgent
PaperAgent
Mar 17, 2026 · Artificial Intelligence

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

This article analyzes the newly released Attention Residuals paper, explaining how learnable attention weighting replaces fixed residual addition to mitigate information dilution in deep LLMs, detailing the proposed Block AttnRes design, engineering trade‑offs, experimental results, and its significance for foundational model architecture.

Block AttentionDeep LearningLLM
0 likes · 9 min read
Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 27, 2026 · Artificial Intelligence

Paper Review: NeurIF – Feature‑Controlled Learning of Dynamic Asset‑Pricing Factors and Loadings

NeurIF introduces a neural instrumented factorization framework that leverages company features as instruments, combines spatial and temporal attention to learn time‑varying latent factors and their loadings, achieves 1‑18% RMSE improvement over transformer baselines, and produces statistically significant long‑short portfolios that explain cross‑sectional pricing anomalies.

NeurIFasset pricingattention
0 likes · 15 min read
Paper Review: NeurIF – Feature‑Controlled Learning of Dynamic Asset‑Pricing Factors and Loadings
AI Cyberspace
AI Cyberspace
Feb 14, 2026 · Artificial Intelligence

Unpacking the Transformer: From Embeddings to Multi‑Head Attention

This article provides a comprehensive, step‑by‑step walkthrough of the Transformer architecture, covering input embedding, positional encoding, the mechanics of Q‑K‑V attention, scaled dot‑product formulas, multi‑head and masked attention, feed‑forward networks, residual connections, layer normalization, decoder generation, and recent attention‑optimization techniques.

Deep LearningFeed-Forward NetworkPositional Encoding
0 likes · 39 min read
Unpacking the Transformer: From Embeddings to Multi‑Head Attention
AI Large Model Application Practice
AI Large Model Application Practice
Jan 1, 2026 · Artificial Intelligence

Why Single-Head Attention Falls Short and Multi-Head Saves the Day

This article explains the inherent limitations of single-head attention in Transformers, illustrates them with a linguistic example, and then details how multi-head attention works through independent projection matrices, splitting and concatenation, ultimately boosting model expressiveness, robustness, and interpretability.

AIattentionmulti-head
0 likes · 9 min read
Why Single-Head Attention Falls Short and Multi-Head Saves the Day
Architect
Architect
Dec 15, 2025 · Artificial Intelligence

Demystifying LLM Architecture: From Transformers to Modern MoE Designs

This comprehensive guide explains the fundamentals of large language model (LLM) architectures, covering the original Transformer, tokenization, embeddings, positional encoding, attention mechanisms, feed‑forward networks, layer stacking, a step‑by‑step translation example, and the latest open‑source and hybrid LLM designs shaping the field.

EmbeddingLLMMoE
0 likes · 41 min read
Demystifying LLM Architecture: From Transformers to Modern MoE Designs
Tencent Technical Engineering
Tencent Technical Engineering
Dec 3, 2025 · Artificial Intelligence

Why Transformers Power Modern LLMs: A Deep Dive into Architecture and Mechanics

This article provides a comprehensive, step‑by‑step explanation of the Transformer architecture that underpins large language models, covering tokenization, embeddings, positional encoding, attention mechanisms, feed‑forward networks, layer stacking, a detailed translation example, visualized attention weights, and a survey of recent open‑source LLM designs such as DeepSeek V3, OLMo 2, and Gemma 3.

EmbeddingLLMNeural Network
0 likes · 38 min read
Why Transformers Power Modern LLMs: A Deep Dive into Architecture and Mechanics
Data Party THU
Data Party THU
Nov 2, 2025 · Artificial Intelligence

From RNN to LLM: How Transformers Power Modern Language Models

This article explains the evolution from RNNs through Encoder‑Decoder models to Transformers, detailing self‑attention, multi‑head attention, and masked attention, and then describes what Large Language Models are, their key components, capabilities, limitations, and common applications.

AIDeep LearningLLM
0 likes · 9 min read
From RNN to LLM: How Transformers Power Modern Language Models
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Deep LearningNeural NetworksPositional Encoding
0 likes · 22 min read
Unveiling Transformer Internals: From Theory to PyTorch Code
Architect
Architect
Sep 16, 2025 · Artificial Intelligence

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

This article introduces the Transformer architecture, explaining its attention mechanism, encoder‑decoder design, training and inference processes, and why it surpasses RNN‑based models, while also covering common applications and variations in natural language processing.

Deep LearningModel architectureNLP
0 likes · 13 min read
Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 6, 2025 · Artificial Intelligence

How Transformers Revolutionize Sequence Modeling: From RNN Limits to Self‑Attention Mastery

This article explains why Transformer models surpass traditional RNN‑based seq2seq architectures by introducing self‑attention, multi‑head attention, and positional encoding, detailing the inner workings of encoders, decoders, and attention mechanisms, and comparing their advantages and limitations across NLP and vision tasks.

GRULSTMRNN
0 likes · 30 min read
How Transformers Revolutionize Sequence Modeling: From RNN Limits to Self‑Attention Mastery
AI Frontier Lectures
AI Frontier Lectures
Jul 10, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Redefine Transformer Scaling Laws?

A recent Meta paper introduces a rotation‑invariant 2‑simplicial attention mechanism, demonstrates its superior scaling‑law coefficients over standard dot‑product attention, and provides experimental evidence of improved token efficiency and model performance under constrained token budgets.

2-simplicialMetaTransformer
0 likes · 11 min read
Can 2‑Simplicial Attention Redefine Transformer Scaling Laws?
IT Services Circle
IT Services Circle
Jul 6, 2025 · Artificial Intelligence

Why Transformers Train Like Any Neural Network: Backpropagation Explained

This article demystifies how Transformers are trained by showing that all their linear layers have learnable weights and biases, and that the attention mechanism—including softmax and dot‑product operations—is fully differentiable and updated via standard back‑propagation.

BackpropagationDeep LearningPyTorch
0 likes · 7 min read
Why Transformers Train Like Any Neural Network: Backpropagation Explained
AI Frontier Lectures
AI Frontier Lectures
Mar 24, 2025 · Artificial Intelligence

How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design

Introducing MambaIRv2, an image restoration backbone that replaces Mamba’s causal scanning with an attentive state‑space module, achieving single‑direction scanning, reduced parameters and computation, and superior performance on lightweight and classic super‑resolution, JPEG artifact removal, and denoising tasks, as validated by CVPR‑2025 results.

Computer VisionImage RestorationMambaIRv2
0 likes · 8 min read
How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design
Cognitive Technology Team
Cognitive Technology Team
Feb 9, 2025 · Artificial Intelligence

A Beginner’s Guide to the History and Key Concepts of Deep Learning

From the perceptron’s inception in 1958 to modern Transformer-based models like GPT, this article traces the evolution of deep learning, explaining foundational architectures such as DNNs, CNNs, RNNs, LSTMs, attention mechanisms, and recent innovations like DeepSeek’s MLA, highlighting their principles and impact.

Deep LearningGPTMLA
0 likes · 19 min read
A Beginner’s Guide to the History and Key Concepts of Deep Learning
DataFunSummit
DataFunSummit
Dec 28, 2024 · Artificial Intelligence

Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques

This talk presents the Ant Group team's recent work on large‑model inference memory optimization, covering GPU memory challenges, virtual memory management (VMM), the Virtual Tensor framework, LayerKV techniques, performance comparisons with Page Attention and FlashAttention, and extensive experimental results demonstrating reduced latency and higher QPS.

GPUVirtual Memoryattention
0 likes · 25 min read
Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques
NewBeeNLP
NewBeeNLP
Nov 18, 2024 · Artificial Intelligence

How to Optimize Multi-Head Attention: From MQA to FlashAttention and Beyond

This article examines various techniques for compressing and accelerating the KV cache in transformer models—including MQA, GQA, MLA, sliding‑window and linear attention, flash attention, page and ring attention, as well as mixed‑precision training and ZeRO parallelism—providing code snippets, implementation details, and practical trade‑offs.

FlashAttentionKV cacheModel Parallelism
0 likes · 17 min read
How to Optimize Multi-Head Attention: From MQA to FlashAttention and Beyond
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 25, 2024 · Artificial Intelligence

How Transformers Work: From Tensor Basics to GPU Performance Analysis

This article provides a comprehensive, engineer‑focused breakdown of transformer architecture—including tensor fundamentals, matrix multiplication, GPU theoretical compute, attention and FFN mechanics, quantitative parameter and FLOP analysis, performance metrics like MFU, parallelism strategies, variant optimizations, and practical exercise questions—offering clear insight into large‑model efficiency and scaling.

FFNGPU performanceTransformer
0 likes · 33 min read
How Transformers Work: From Tensor Basics to GPU Performance Analysis
JD Tech
JD Tech
Jun 7, 2024 · Artificial Intelligence

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

This article explains the fundamentals of attention mechanisms, including biological inspiration, the evolution from early visual attention to modern self‑attention in Transformers, details the scaled dot‑product calculations, positional encoding, and multi‑head attention, illustrating how these concepts enable efficient parallel processing of sequence data.

AIPositional EncodingSelf-Attention
0 likes · 12 min read
Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers
Baobao Algorithm Notes
Baobao Algorithm Notes
May 5, 2024 · Artificial Intelligence

Deep Dive into Transformer Mechanics: Scaling, Q/K Projections, FFNs, and More

This article provides concise technical explanations for 25 common questions about Transformer models, covering scaled dot‑product attention scaling, separate Q/K projections, feed‑forward network design, attention variants, normalization, LoRA versus full‑parameter training, KV‑cache, pre‑ and post‑norm, computational cost analysis, and advanced position‑encoding techniques.

LLMLoRATransformer
0 likes · 25 min read
Deep Dive into Transformer Mechanics: Scaling, Q/K Projections, FFNs, and More
DaTaobao Tech
DaTaobao Tech
Mar 27, 2024 · Artificial Intelligence

Building a Simple Diffusion Model with Python

This tutorial walks through implementing a basic Denoising Diffusion Probabilistic Model in Python, explaining the forward noise schedule, reverse denoising training, and providing complete code for noise schedules, diffusion functions, residual and attention blocks, a UNet architecture, loss computation, and a training loop.

DDPMPythonU-Net
0 likes · 26 min read
Building a Simple Diffusion Model with Python
Ele.me Technology
Ele.me Technology
Mar 21, 2024 · Artificial Intelligence

How FIN Boosts CTR in Online Food Ordering: A Spatial‑Temporal Modeling Breakthrough

The paper introduces FIN (Fragment and Integrate Network), a novel spatial‑temporal model that extracts multiple sub‑sequences from ultra‑long user behavior logs, applies simplified and multi‑head attention, and fuses them with physically meaningful set operations, achieving up to 5.7% CTR lift and 7.3% RPM improvement in real‑world food‑delivery advertising.

AICTR predictionLong Sequence Modeling
0 likes · 23 min read
How FIN Boosts CTR in Online Food Ordering: A Spatial‑Temporal Modeling Breakthrough
NewBeeNLP
NewBeeNLP
Mar 18, 2024 · Artificial Intelligence

Mastering RAG and LLM Techniques: From Retrieval to Fine‑Tuning

This article provides a comprehensive technical guide on Retrieval‑Augmented Generation (RAG), open‑source large language models such as LLaMA, fine‑tuning methods, evaluation metrics, memory‑optimization tricks, and attention‑related optimizations for modern AI systems.

LLMLangChainMemory Optimization
0 likes · 19 min read
Mastering RAG and LLM Techniques: From Retrieval to Fine‑Tuning
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 15, 2023 · Artificial Intelligence

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

This article explains the Transformer model, comparing it with RNNs, detailing its encoder‑decoder structure, multi‑head and scaled dot‑product attention, embedding layers, feed‑forward networks, and the final linear‑softmax output, supplemented with diagrams and code examples.

Deep LearningEncoder-DecoderNeural Networks
0 likes · 10 min read
Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 12, 2023 · Artificial Intelligence

A Comprehensive Introduction to RNN, LSTM, Attention Mechanisms, and Transformers for Large Language Models

This article provides a thorough overview of large language models, explaining the relationship between NLP and LLMs, the evolution from RNN to LSTM, the fundamentals of attention mechanisms, and the architecture and operation of Transformer models, all illustrated with clear examples and diagrams.

LSTMNLPRNN
0 likes · 25 min read
A Comprehensive Introduction to RNN, LSTM, Attention Mechanisms, and Transformers for Large Language Models
DataFunSummit
DataFunSummit
Sep 29, 2023 · Artificial Intelligence

Social4Rec: Enhancing Video Recommendation with Social Interest Networks

This article introduces Social4Rec, a video recommendation algorithm that tackles user cold‑start problems by extracting and integrating social interest information through coarse‑ and fine‑grained interest extractors, attention‑based fusion, and extensive offline and online experiments demonstrating significant CTR improvements.

Deep Learningattentioncold-start
0 likes · 14 min read
Social4Rec: Enhancing Video Recommendation with Social Interest Networks
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 4, 2023 · Artificial Intelligence

Hands‑On Building a Transformer from Scratch with PyTorch

This tutorial walks you through implementing a full Transformer model in PyTorch, starting from basic linear‑regression code, adding attention mechanisms, multi‑head attention, encoder‑decoder architecture, training loops, and inference, all reinforced with practical debugging tips.

Deep LearningNLPPyTorch
0 likes · 17 min read
Hands‑On Building a Transformer from Scratch with PyTorch
Nightwalker Tech
Nightwalker Tech
Jul 19, 2023 · Artificial Intelligence

Step‑by‑Step Implementation of Transformer Blocks, Attention, Normalization, Feed‑Forward, Encoder and Decoder in PyTorch

This article provides a comprehensive tutorial on building the core components of a Transformer model—including multi‑head attention, layer normalization, feed‑forward networks, encoder and decoder layers—and assembles them into a complete PyTorch implementation, supplemented with explanatory diagrams and runnable code.

DecoderDeep LearningEncoder
0 likes · 13 min read
Step‑by‑Step Implementation of Transformer Blocks, Attention, Normalization, Feed‑Forward, Encoder and Decoder in PyTorch
Model Perspective
Model Perspective
Jul 6, 2023 · Fundamentals

Understanding Information Processing Theory: How the Mind Works Like a Computer

The information processing theory, emerging in the 1950s‑60s, likens human cognition to computer operations, detailing how perception, attention, memory, conceptual knowledge, reasoning, and feedback mechanisms transform sensory input into mental representations and guide behavior, influencing cognitive psychology, education, and HCI.

Memoryattentioncognitive psychology
0 likes · 4 min read
Understanding Information Processing Theory: How the Mind Works Like a Computer
DataFunSummit
DataFunSummit
Jun 21, 2023 · Artificial Intelligence

Graph‑Enhanced Node Representation for Cold‑Start Recommendation: Neighbour‑Enhanced YouTubeDNN

This article proposes a graph‑based node representation method that combines static attribute graphs and dynamic interaction graphs with multi‑level attention to alleviate user and item cold‑start problems in recommendation systems, achieving notable AUC improvements on sparsified MovieLens datasets.

EmbeddingGraph Neural NetworkMovieLens
0 likes · 9 min read
Graph‑Enhanced Node Representation for Cold‑Start Recommendation: Neighbour‑Enhanced YouTubeDNN
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 5, 2023 · Artificial Intelligence

How DETR and Its Successors Evolve: A Deep Dive into the DETR Series for Object Detection

This article reviews the original DETR model, analyzes its strengths and weaknesses, and then examines two major follow‑up works—Deformable‑DETR and DAB‑DETR—explaining how they modify attention mechanisms, introduce deformable convolutions and dynamic anchor boxes to accelerate convergence and improve small‑object detection.

DAB-DETRDETRDeformable-DETR
0 likes · 12 min read
How DETR and Its Successors Evolve: A Deep Dive into the DETR Series for Object Detection
Architect's Guide
Architect's Guide
Feb 9, 2023 · Artificial Intelligence

Why ChatGPT Is So Powerful: A Technical Overview of NLP Model Evolution

This article explains why ChatGPT performs so well by tracing the evolution of natural‑language processing from rule‑based grammars through statistical n‑gram models to neural architectures like RNNs, LSTMs, attention mechanisms, Transformers, and the massive data and training methods that power modern large language models.

ChatGPTNLPTransformer
0 likes · 14 min read
Why ChatGPT Is So Powerful: A Technical Overview of NLP Model Evolution
AntTech
AntTech
Dec 19, 2022 · Artificial Intelligence

TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision

TransVCL introduces an end‑to‑end attention‑enhanced video copy localization network that leverages a custom Transformer, correlation‑Softmax similarity matrix, and temporal alignment module, combined with a semi‑supervised learning framework, achieving state‑of‑the‑art performance on VCSL and VCDB benchmarks.

AISemi-supervised LearningTransformer
0 likes · 13 min read
TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision
DaTaobao Tech
DaTaobao Tech
Feb 22, 2022 · Artificial Intelligence

Graph-based Deep Recall Models for Sparse User Behavior in Content Recommendation

The paper proposes graph‑based deep recall models that enrich sparse user behavior sequences in video recommendation by integrating content knowledge graphs and adaptive attention mechanisms, demonstrating that variants such as GADM, SGGA, and SGGGA significantly boost click‑through rates in online experiments.

Knowledge GraphRecommendation Systemsattention
0 likes · 11 min read
Graph-based Deep Recall Models for Sparse User Behavior in Content Recommendation
DataFunTalk
DataFunTalk
Feb 18, 2022 · Artificial Intelligence

Travel Intent Prediction in E-commerce: Algorithm Strategies, Multi‑source Behavior Modeling, and Model Design

This talk presents Alibaba's travel intent prediction system, detailing the unique challenges of low‑frequency, multi‑source travel behavior, the multi‑granular CNN and time‑attention model architecture, experimental comparisons with baselines, and how integrated user interest modeling improves recommendation performance.

Deep Learningattentionmachine learning
0 likes · 11 min read
Travel Intent Prediction in E-commerce: Algorithm Strategies, Multi‑source Behavior Modeling, and Model Design
DataFunSummit
DataFunSummit
Jan 14, 2022 · Artificial Intelligence

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

This presentation introduces Tencent Angel Graph's NDLS and GAMLP techniques that address GNN scalability and flexibility by adaptively selecting propagation depth per node, employing node‑wise feature and label propagation with attention mechanisms, and demonstrating superior performance on large‑scale and sparse graph benchmarks.

GAMLPNode AdaptiveScalability
0 likes · 16 min read
Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks
Alimama Tech
Alimama Tech
Dec 15, 2021 · Artificial Intelligence

Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising

SMAD is a scalable graph‑based ad retrieval framework for e‑commerce search that builds a heterogeneous Query‑Item‑Ad graph, learns multi‑view embeddings with a parallel deep neural network and attention, employs category‑aware sampling for efficient distributed training, and delivers significant gains in offline relevance and online CTR, RPM, and PVR.

Distributed Trainingad retrievalattention
0 likes · 17 min read
Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising
DataFunSummit
DataFunSummit
Nov 21, 2021 · Artificial Intelligence

Sequential Recommendation Algorithms: Overview and Techniques

This article surveys sequential recommendation methods, covering standard models such as pooling, RNN, CNN, attention, and Transformer, as well as long‑short term, multi‑interest, multi‑behavior approaches, and recent advances like contrastive learning, highlighting their impact on recommendation performance.

RNNTransformerattention
0 likes · 8 min read
Sequential Recommendation Algorithms: Overview and Techniques
DeWu Technology
DeWu Technology
Nov 18, 2021 · Artificial Intelligence

Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM

The project presents a lightweight MobileNet‑FPN architecture enhanced with a modified spatial‑attention module that evaluates corner‑based self‑similarity to classify sneaker photo backgrounds, achieving 96% test accuracy—exceeding baseline CNN performance—and meeting business targets of over 80% hint accuracy and 90% mandatory enforcement.

CNNComputer VisionImage Processing
0 likes · 12 min read
Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM
DataFunSummit
DataFunSummit
Nov 2, 2021 · Artificial Intelligence

Applying Deep Learning to Time Series Data for Financial Risk Modeling

This article explains how a financial company leverages deep learning sequence models, including embedding, attention, and transformer techniques, to automatically extract features from massive time‑series data, improve risk model performance, and build a reusable, end‑to‑end system framework.

AIEmbeddingattention
0 likes · 8 min read
Applying Deep Learning to Time Series Data for Financial Risk Modeling
58 Tech
58 Tech
Oct 12, 2021 · Artificial Intelligence

Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

This article presents a practical study of extracting phone numbers from two‑speaker voice dialogues using Seq2Seq models—including LSTM, GRU with attention and feature fusion, and Transformer—detailing data characteristics, model architectures, training strategies, experimental results, and comparative analysis showing the GRU‑Attention approach achieving the best performance.

GRULSTMNLP
0 likes · 13 min read
Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues
DataFunTalk
DataFunTalk
Jun 4, 2021 · Artificial Intelligence

Advances in Ranking Algorithms for the "Good Goods" Recommendation Scenario

This article presents a comprehensive overview of recent advancements in ranking algorithms for the Good Goods recommendation scenario, covering long‑sequence modeling, category‑retrieval attention, multi‑objective ranking, model structure optimizations, loss functions, and LTR techniques, along with experimental results and practical insights.

LTRModel Optimizationattention
0 likes · 13 min read
Advances in Ranking Algorithms for the "Good Goods" Recommendation Scenario
Cyber Elephant Tech Team
Cyber Elephant Tech Team
Apr 28, 2021 · Artificial Intelligence

Understanding BERT: From Encoder-Decoder to Transformer and Attention

This article explains the BERT model by first reviewing the Encoder-Decoder framework, then detailing the attention mechanism—including self-attention and multi-head attention—before describing the full Transformer architecture and finally outlining BERT’s encoder-only design, training stages, and fine-tuning applications.

BERTEncoder-DecoderNLP
0 likes · 15 min read
Understanding BERT: From Encoder-Decoder to Transformer and Attention
DataFunTalk
DataFunTalk
Apr 17, 2021 · Artificial Intelligence

Personalized Re-ranking for Recommendation (ResSys'19)

This article introduces a personalized re‑ranking model for recommendation systems, explaining the limitations of traditional point‑wise ranking, describing the PRM architecture with input, encoding, and output layers using multi‑head attention and pre‑trained personalization features, and presenting experimental results and future extensions.

CTRTransformerattention
0 likes · 7 min read
Personalized Re-ranking for Recommendation (ResSys'19)
DataFunTalk
DataFunTalk
Apr 3, 2021 · Artificial Intelligence

A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

User behavior sequence modeling, crucial for search and recommendation advertising ranking, has evolved from simple pooling to attention, RNN, capsule, and Transformer architectures, with industrial applications across e‑commerce, social, video, and music platforms, and future directions include time‑aware, multi‑dimensional, and self‑supervised approaches.

Deep LearningRecommendation SystemsSequence Modeling
0 likes · 24 min read
A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Artificial Intelligence

Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation

This article analyzes the RealFormer modification to the Transformer architecture, details its implementation in BERT, and presents extensive experiments showing that while RealFormer can boost performance on low‑label‑count classification tasks, its benefits diminish or disappear as the number of classes grows.

BERTRealFormerResidual
0 likes · 12 min read
Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation
New Oriental Technology
New Oriental Technology
Feb 1, 2021 · Artificial Intelligence

Neural Machine Translation: Seq2Seq, Beam Search, BLEU, Attention Mechanisms, and GNMT Improvements

This article explains key concepts of neural machine translation, covering Seq2Seq encoder‑decoder models, beam search strategies, BLEU evaluation, various attention mechanisms, and the enhancements introduced in Google's Neural Machine Translation system to improve speed, OOV handling, and translation quality.

BLEUBeam SearchGNMT
0 likes · 11 min read
Neural Machine Translation: Seq2Seq, Beam Search, BLEU, Attention Mechanisms, and GNMT Improvements
JD Tech Talk
JD Tech Talk
Jan 28, 2021 · Artificial Intelligence

Spatial‑Temporal Graph Diffusion Network for City Traffic Flow Forecasting

This article introduces a hierarchical graph neural network model that jointly captures multi‑scale temporal patterns and global spatial context for urban traffic flow prediction, demonstrates its superiority over existing methods on multiple public datasets, and validates each component through extensive ablation studies.

Deep LearningGraph Neural Networkattention
0 likes · 8 min read
Spatial‑Temporal Graph Diffusion Network for City Traffic Flow Forecasting
Didi Tech
Didi Tech
May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning
0 likes · 16 min read
How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models
DataFunTalk
DataFunTalk
Apr 21, 2020 · Artificial Intelligence

Attention Mechanisms in Deep Learning Recommendation Models: A Survey

This article surveys the application of attention mechanisms in deep learning recommendation systems, reviewing models such as AFM, DIN, DIEN, DSIN, Behavior Sequence Transformer, Deep Spatio‑Temporal Networks, and ATRank, and discusses their architectures, attention types, advantages, and limitations.

CTR predictionDeep LearningRecommendation Systems
0 likes · 10 min read
Attention Mechanisms in Deep Learning Recommendation Models: A Survey
DataFunTalk
DataFunTalk
Feb 3, 2020 · Artificial Intelligence

Advances in Speech Recognition: Concepts, Deep Learning Methods, and Didi’s Applications

This article presents a comprehensive overview of modern speech recognition technology, covering basic ASR concepts, classic acoustic and language models, deep‑learning approaches such as DNN‑HMM, CTC, attention‑based and transformer models, multimodal fusion, signal‑processing pipelines, and practical deployment considerations at Didi.

ASRCTCDeep Learning
0 likes · 15 min read
Advances in Speech Recognition: Concepts, Deep Learning Methods, and Didi’s Applications
DataFunTalk
DataFunTalk
Dec 16, 2019 · Artificial Intelligence

A Comprehensive Overview of Sequential Recommendation Models and Techniques

This article provides an in-depth overview of sequential recommendation, defining the problem, discussing data preparation, and reviewing various neural architectures—including MLP, CNN, RNN, Temporal CNN, self‑attention, and reinforcement‑learning approaches—while offering practical guidance on model selection and implementation.

CNNDeep LearningRNN
0 likes · 36 min read
A Comprehensive Overview of Sequential Recommendation Models and Techniques
DataFunTalk
DataFunTalk
Nov 25, 2019 · Artificial Intelligence

Real-time Attention-based Look-alike Model for Recommender Systems

This talk presents a real-time attention-based look‑alike model (RALM) designed to address the long‑tail problem in recommendation systems by efficiently expanding seed users, leveraging user representation learning, attention mechanisms, and clustering to deliver timely, diverse content without retraining the model.

Long Tailattentionclustering
0 likes · 24 min read
Real-time Attention-based Look-alike Model for Recommender Systems
Qunar Tech Salon
Qunar Tech Salon
Sep 12, 2019 · Artificial Intelligence

A Comprehensive Overview of Attention Mechanisms in Deep Learning

This article systematically reviews the history, core concepts, variants, and practical implementations of attention mechanisms—from early additive and multiplicative forms to self‑attention, multi‑head attention, and recent transformer‑based models—highlighting why attention has become fundamental in modern AI research.

Deep LearningNLPSelf-Attention
0 likes · 16 min read
A Comprehensive Overview of Attention Mechanisms in Deep Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 9, 2019 · Artificial Intelligence

Demystifying Attention: A Beginner’s Guide to History, Types, and Applications

This article provides a comprehensive, beginner‑friendly overview of attention mechanisms—from their origins in early neural machine translation papers to modern self‑attention, multi‑head attention, and transformer variants—explaining core concepts, common variants, and why attention has become essential across NLP and vision tasks.

NLPattention
0 likes · 18 min read
Demystifying Attention: A Beginner’s Guide to History, Types, and Applications
HomeTech
HomeTech
Aug 7, 2019 · Artificial Intelligence

Near-Duplicate Video Retrieval: Framework, Feature Extraction, Metric Learning, and Model Optimization

This article presents a comprehensive study of near‑duplicate video retrieval, covering the definition of near‑duplicate videos, motivations for deduplication, challenges, a two‑stage offline/online processing framework, keyframe and VGG16‑based feature extraction, metric‑learning loss functions, training procedures, dataset preparation, evaluation metrics, and model enhancements using LSTM and attention mechanisms.

LSTMMAPVGG16
0 likes · 12 min read
Near-Duplicate Video Retrieval: Framework, Feature Extraction, Metric Learning, and Model Optimization
DataFunTalk
DataFunTalk
Mar 13, 2019 · Artificial Intelligence

A Comprehensive Overview of NLP Development and Deep Learning Models

This article reviews the history of natural language processing, explains key deep‑learning models such as NNLM, Word2vec, CNN, RNN, attention mechanisms, and Transformers, and discusses their applications, future trends, and practical considerations in NLP tasks.

NLPTransformerattention
0 likes · 38 min read
A Comprehensive Overview of NLP Development and Deep Learning Models
DataFunTalk
DataFunTalk
Feb 27, 2019 · Artificial Intelligence

Human‑Interactive Machine Translation: Research, Techniques, and Productization

This article reviews the current state of machine translation, explores the challenges of ambiguity, quality, and domain specificity, and presents human‑in‑the‑loop translation techniques—including attention‑enhanced models, transformer architectures, and online learning—while discussing practical productization and deployment considerations.

AI productizationHuman-in-the-LoopOnline Learning
0 likes · 16 min read
Human‑Interactive Machine Translation: Research, Techniques, and Productization
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 30, 2018 · Artificial Intelligence

How Advanced LSTM (A‑LSTM) Boosts Speech Emotion Recognition by 5.5%

This article introduces Advanced LSTM (A‑LSTM), which linearly combines multiple past hidden states to overcome traditional LSTM's one‑step dependency, and demonstrates its application in utterance‑level speech emotion recognition, achieving a 5.5% accuracy improvement through attention‑based weighted‑pooling RNNs and auxiliary speaker and gender tasks.

A-LSTMDeep LearningLSTM
0 likes · 8 min read
How Advanced LSTM (A‑LSTM) Boosts Speech Emotion Recognition by 5.5%
AntTech
AntTech
Aug 16, 2018 · Artificial Intelligence

Deep Learning Approaches for Text Classification in Alipay Complaint Fraud Detection

This article reviews deep‑learning‑based text classification techniques—including TextCNN, BiGRU, Capsule Networks, Attention mechanisms, and the novel cw2vec embedding—applied to Alipay complaint fraud data, presents experimental comparisons, and discusses their advantages, challenges, and future directions.

AlipayDeep Learningattention
0 likes · 18 min read
Deep Learning Approaches for Text Classification in Alipay Complaint Fraud Detection
Didi Tech
Didi Tech
Jun 1, 2018 · Artificial Intelligence

Didi's Attention-Based End-to-End Mandarin Speech Recognition: A Detailed Review

Didi’s attention‑based end‑to‑end Mandarin speech recognizer, built on the Listen‑Attend‑Spell architecture and modeling roughly 5,000 common characters, delivers 15‑25% relative accuracy gains over its prior LSTM‑CTC system while cutting model size, latency and server requirements and simplifying training by eliminating separate acoustic, pronunciation and language components.

End-to-EndLASMandarin
0 likes · 14 min read
Didi's Attention-Based End-to-End Mandarin Speech Recognition: A Detailed Review
Hulu Beijing
Hulu Beijing
Dec 14, 2017 · Artificial Intelligence

Understanding Seq2Seq: Framework, Advantages, and Decoding Techniques

This article explains the Seq2Seq encoder‑decoder framework, its benefits for various sequence modeling tasks, and compares common decoding strategies such as greedy search and beam search, while also introducing attention and other enhancements for improved performance.

Beam SearchEncoder-Decoderattention
0 likes · 9 min read
Understanding Seq2Seq: Framework, Advantages, and Decoding Techniques
Suning Design
Suning Design
Apr 12, 2014 · Product Management

How Emotional Design Shapes User Relationships and Behavior

The article explains how emotional design leverages usefulness, usability, and delight to capture attention, influence emotions, and drive user actions, ultimately forming lasting relationships between users and products through the dimensions of value and arousal.

Emotional DesignUser experienceattention
0 likes · 16 min read
How Emotional Design Shapes User Relationships and Behavior