Tag

Attention

0 views collected around this technical thread.

Cognitive Technology Team
Cognitive Technology Team
Feb 9, 2025 · Artificial Intelligence

A Beginner’s Guide to the History and Key Concepts of Deep Learning

From the perceptron’s inception in 1958 to modern Transformer-based models like GPT, this article traces the evolution of deep learning, explaining foundational architectures such as DNNs, CNNs, RNNs, LSTMs, attention mechanisms, and recent innovations like DeepSeek’s MLA, highlighting their principles and impact.

AttentionGPTHistory
0 likes · 19 min read
A Beginner’s Guide to the History and Key Concepts of Deep Learning
DataFunSummit
DataFunSummit
Dec 28, 2024 · Artificial Intelligence

Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques

This talk presents the Ant Group team's recent work on large‑model inference memory optimization, covering GPU memory challenges, virtual memory management (VMM), the Virtual Tensor framework, LayerKV techniques, performance comparisons with Page Attention and FlashAttention, and extensive experimental results demonstrating reduced latency and higher QPS.

AttentionGPUMemory Optimization
0 likes · 25 min read
Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques
JD Tech
JD Tech
Jun 7, 2024 · Artificial Intelligence

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

This article explains the fundamentals of attention mechanisms, including biological inspiration, the evolution from early visual attention to modern self‑attention in Transformers, details the scaled dot‑product calculations, positional encoding, and multi‑head attention, illustrating how these concepts enable efficient parallel processing of sequence data.

AIAttentionPositional Encoding
0 likes · 12 min read
Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers
DaTaobao Tech
DaTaobao Tech
Mar 27, 2024 · Artificial Intelligence

Building a Simple Diffusion Model with Python

This tutorial walks through implementing a basic Denoising Diffusion Probabilistic Model in Python, explaining the forward noise schedule, reverse denoising training, and providing complete code for noise schedules, diffusion functions, residual and attention blocks, a UNet architecture, loss computation, and a training loop.

AttentionDDPMPython
0 likes · 26 min read
Building a Simple Diffusion Model with Python
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 15, 2023 · Artificial Intelligence

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

This article explains the Transformer model, comparing it with RNNs, detailing its encoder‑decoder structure, multi‑head and scaled dot‑product attention, embedding layers, feed‑forward networks, and the final linear‑softmax output, supplemented with diagrams and code examples.

Artificial IntelligenceAttentionEncoder-Decoder
0 likes · 10 min read
Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 12, 2023 · Artificial Intelligence

A Comprehensive Introduction to RNN, LSTM, Attention Mechanisms, and Transformers for Large Language Models

This article provides a thorough overview of large language models, explaining the relationship between NLP and LLMs, the evolution from RNN to LSTM, the fundamentals of attention mechanisms, and the architecture and operation of Transformer models, all illustrated with clear examples and diagrams.

Artificial IntelligenceAttentionLSTM
0 likes · 25 min read
A Comprehensive Introduction to RNN, LSTM, Attention Mechanisms, and Transformers for Large Language Models
DataFunSummit
DataFunSummit
Sep 29, 2023 · Artificial Intelligence

Social4Rec: Enhancing Video Recommendation with Social Interest Networks

This article introduces Social4Rec, a video recommendation algorithm that tackles user cold‑start problems by extracting and integrating social interest information through coarse‑ and fine‑grained interest extractors, attention‑based fusion, and extensive offline and online experiments demonstrating significant CTR improvements.

AttentionCold Startdeep learning
0 likes · 14 min read
Social4Rec: Enhancing Video Recommendation with Social Interest Networks
Nightwalker Tech
Nightwalker Tech
Jul 19, 2023 · Artificial Intelligence

Step‑by‑Step Implementation of Transformer Blocks, Attention, Normalization, Feed‑Forward, Encoder and Decoder in PyTorch

This article provides a comprehensive tutorial on building the core components of a Transformer model—including multi‑head attention, layer normalization, feed‑forward networks, encoder and decoder layers—and assembles them into a complete PyTorch implementation, supplemented with explanatory diagrams and runnable code.

AttentionDecoderEncoder
0 likes · 13 min read
Step‑by‑Step Implementation of Transformer Blocks, Attention, Normalization, Feed‑Forward, Encoder and Decoder in PyTorch
Model Perspective
Model Perspective
Jul 6, 2023 · Fundamentals

Understanding Information Processing Theory: How the Mind Works Like a Computer

The information processing theory, emerging in the 1950s‑60s, likens human cognition to computer operations, detailing how perception, attention, memory, conceptual knowledge, reasoning, and feedback mechanisms transform sensory input into mental representations and guide behavior, influencing cognitive psychology, education, and HCI.

AttentionCognitive Psychologyhuman cognition
0 likes · 4 min read
Understanding Information Processing Theory: How the Mind Works Like a Computer
DataFunSummit
DataFunSummit
Jun 21, 2023 · Artificial Intelligence

Graph‑Enhanced Node Representation for Cold‑Start Recommendation: Neighbour‑Enhanced YouTubeDNN

This article proposes a graph‑based node representation method that combines static attribute graphs and dynamic interaction graphs with multi‑level attention to alleviate user and item cold‑start problems in recommendation systems, achieving notable AUC improvements on sparsified MovieLens datasets.

AttentionCold StartMovieLens
0 likes · 9 min read
Graph‑Enhanced Node Representation for Cold‑Start Recommendation: Neighbour‑Enhanced YouTubeDNN
Architect's Guide
Architect's Guide
Feb 9, 2023 · Artificial Intelligence

Why ChatGPT Is So Powerful: A Technical Overview of NLP Model Evolution

This article explains why ChatGPT performs so well by tracing the evolution of natural‑language processing from rule‑based grammars through statistical n‑gram models to neural architectures like RNNs, LSTMs, attention mechanisms, Transformers, and the massive data and training methods that power modern large language models.

AttentionChatGPTNLP
0 likes · 14 min read
Why ChatGPT Is So Powerful: A Technical Overview of NLP Model Evolution
AntTech
AntTech
Dec 19, 2022 · Artificial Intelligence

TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision

TransVCL introduces an end‑to‑end attention‑enhanced video copy localization network that leverages a custom Transformer, correlation‑Softmax similarity matrix, and temporal alignment module, combined with a semi‑supervised learning framework, achieving state‑of‑the‑art performance on VCSL and VCDB benchmarks.

AIAttentionSemi-supervised Learning
0 likes · 13 min read
TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision
DaTaobao Tech
DaTaobao Tech
Feb 22, 2022 · Artificial Intelligence

Graph-based Deep Recall Models for Sparse User Behavior in Content Recommendation

The paper proposes graph‑based deep recall models that enrich sparse user behavior sequences in video recommendation by integrating content knowledge graphs and adaptive attention mechanisms, demonstrating that variants such as GADM, SGGA, and SGGGA significantly boost click‑through rates in online experiments.

AttentionGraph Neural NetworksRecommendation systems
0 likes · 11 min read
Graph-based Deep Recall Models for Sparse User Behavior in Content Recommendation
DataFunTalk
DataFunTalk
Feb 18, 2022 · Artificial Intelligence

Travel Intent Prediction in E-commerce: Algorithm Strategies, Multi‑source Behavior Modeling, and Model Design

This talk presents Alibaba's travel intent prediction system, detailing the unique challenges of low‑frequency, multi‑source travel behavior, the multi‑granular CNN and time‑attention model architecture, experimental comparisons with baselines, and how integrated user interest modeling improves recommendation performance.

Attentiondeep learningmachine learning
0 likes · 11 min read
Travel Intent Prediction in E-commerce: Algorithm Strategies, Multi‑source Behavior Modeling, and Model Design
DataFunSummit
DataFunSummit
Jan 14, 2022 · Artificial Intelligence

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

This presentation introduces Tencent Angel Graph's NDLS and GAMLP techniques that address GNN scalability and flexibility by adaptively selecting propagation depth per node, employing node‑wise feature and label propagation with attention mechanisms, and demonstrating superior performance on large‑scale and sparse graph benchmarks.

AttentionGAMLPGraph Neural Networks
0 likes · 16 min read
Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks
Alimama Tech
Alimama Tech
Dec 15, 2021 · Artificial Intelligence

Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising

SMAD is a scalable graph‑based ad retrieval framework for e‑commerce search that builds a heterogeneous Query‑Item‑Ad graph, learns multi‑view embeddings with a parallel deep neural network and attention, employs category‑aware sampling for efficient distributed training, and delivers significant gains in offline relevance and online CTR, RPM, and PVR.

Ad RetrievalAttentiondistributed training
0 likes · 17 min read
Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising
DataFunSummit
DataFunSummit
Nov 21, 2021 · Artificial Intelligence

Sequential Recommendation Algorithms: Overview and Techniques

This article surveys sequential recommendation methods, covering standard models such as pooling, RNN, CNN, attention, and Transformer, as well as long‑short term, multi‑interest, multi‑behavior approaches, and recent advances like contrastive learning, highlighting their impact on recommendation performance.

AttentionRNNTransformer
0 likes · 8 min read
Sequential Recommendation Algorithms: Overview and Techniques
DeWu Technology
DeWu Technology
Nov 18, 2021 · Artificial Intelligence

Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM

The project presents a lightweight MobileNet‑FPN architecture enhanced with a modified spatial‑attention module that evaluates corner‑based self‑similarity to classify sneaker photo backgrounds, achieving 96% test accuracy—exceeding baseline CNN performance—and meeting business targets of over 80% hint accuracy and 90% mandatory enforcement.

AttentionCNNMobileNet
0 likes · 12 min read
Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM
DataFunSummit
DataFunSummit
Nov 2, 2021 · Artificial Intelligence

Applying Deep Learning to Time Series Data for Financial Risk Modeling

This article explains how a financial company leverages deep learning sequence models, including embedding, attention, and transformer techniques, to automatically extract features from massive time‑series data, improve risk model performance, and build a reusable, end‑to‑end system framework.

AIAttentiondeep learning
0 likes · 8 min read
Applying Deep Learning to Time Series Data for Financial Risk Modeling
58 Tech
58 Tech
Oct 12, 2021 · Artificial Intelligence

Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

This article presents a practical study of extracting phone numbers from two‑speaker voice dialogues using Seq2Seq models—including LSTM, GRU with attention and feature fusion, and Transformer—detailing data characteristics, model architectures, training strategies, experimental results, and comparative analysis showing the GRU‑Attention approach achieving the best performance.

AttentionGRULSTM
0 likes · 13 min read
Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues