Tagged articles
4 articles
Page 1 of 1
Data Party THU
Data Party THU
Apr 3, 2026 · Artificial Intelligence

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

The article reviews the Kimi team's Attention Residuals approach, which substitutes traditional ResNet additive shortcuts with learned attention‑based weighting, explains the theoretical motivation linking depth to time, details full‑attention and block‑wise implementations, presents experimental results showing up to 1.25× compute efficiency and improved performance on reasoning and knowledge tasks.

Attention MechanismDeep LearningResidual Networks
0 likes · 11 min read
Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough
Python Programming Learning Circle
Python Programming Learning Circle
Jul 6, 2021 · Artificial Intelligence

Understanding ResNet and Building It from Scratch with PyTorch

This article explains the motivation behind residual networks, describes the architecture of ResNet including residual blocks and skip connections, lists available Keras implementations, and provides a step‑by‑step PyTorch tutorial with complete code to construct and test ResNet‑50/101/152 models.

CNNDeep LearningPyTorch
0 likes · 10 min read
Understanding ResNet and Building It from Scratch with PyTorch
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 26, 2016 · Artificial Intelligence

ICML Tutorial Highlights: Deep Residual Nets, Stochastic Gradient, Deep RL

At the ICML pre‑conference tutorial, experts presented deep residual networks, stochastic gradient methods for large‑scale learning, and deep reinforcement learning, highlighting architectural innovations, optimization theory, noise‑reduction techniques, and practical considerations for building scalable, high‑performance AI models.

Deep LearningResidual Networksstochastic gradient
0 likes · 14 min read
ICML Tutorial Highlights: Deep Residual Nets, Stochastic Gradient, Deep RL