Tagged articles
4 articles
Page 1 of 1
SuanNi
SuanNi
Mar 17, 2026 · Artificial Intelligence

How Attention Residuals Boost Transformer Efficiency and Scale

The article presents the Attention Residuals architecture, explains how it replaces uniform residual addition with learned attention‑based aggregation, details full and block variants, engineering tricks for distributed training, and shows extensive scaling‑law experiments where the new design consistently improves validation loss and training efficiency across model sizes.

Attention ResidualsModel ScalingTransformer
0 likes · 13 min read
How Attention Residuals Boost Transformer Efficiency and Scale
AIWalker
AIWalker
Mar 15, 2025 · Artificial Intelligence

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

SANA 1.5 introduces an efficient model‑growth pipeline, depth‑pruning, and inference‑time scaling that reuse a 1.6 B‑parameter foundation to train a 4.8 B model with 8× lower memory, 60 % less training time, and GenEval scores that rival or surpass much larger diffusion models.

Inference ScalingModel Scalingdiffusion
0 likes · 17 min read
How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA
Programmer DD
Programmer DD
Apr 14, 2023 · Artificial Intelligence

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

Microsoft open‑sourced DeepSpeed‑Chat, a toolkit that streamlines the end‑to‑end training and inference of ChatGPT‑like large language models using RLHF, delivering up to fifteen‑fold speedups and dramatically lower costs, even on a single GPU.

ChatGPTDeepSpeedRLHF
0 likes · 8 min read
How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×
DataFunTalk
DataFunTalk
Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchMultimodalefficient training
0 likes · 27 min read
Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey