Tagged articles

Efficient Training

6 articles · Page 1 of 1

Jun 8, 2026 · Artificial Intelligence

Agent Harness Model Achieves Frontier Performance at <1% Compute Cost – Introducing Macaron‑V1‑Preview

A 30‑person lab trained a 749B‑parameter Agent model called Macaron‑V1‑Preview using fewer than 300 GPUs, achieving less than 1% of the compute cost of comparable models while matching state‑of‑the‑art performance on real‑world Agent benchmarks such as LivingBench, VitaBench, A2UI and PinchBench.

AIAgentEfficient Training

0 likes · 15 min read

Agent Harness Model Achieves Frontier Performance at <1% Compute Cost – Introducing Macaron‑V1‑Preview

Machine Learning Algorithms & Natural Language Processing

Jun 8, 2026 · Artificial Intelligence

MindLab Unveils 749B Agent-Optimized Macaron‑V1‑Preview Model

MindLab released the 749B‑parameter Macaron‑V1‑Preview, a model engineered for deep Agent‑Harness post‑training that was trained on fewer than 300 GPUs at less than 1% of the compute cost of peer models and achieves SOTA results on multiple Agent‑centric benchmarks such as LivingBench, VitaBench and PinchBench.

Agent HarnessEfficient TrainingLoRA

0 likes · 16 min read

MindLab Unveils 749B Agent-Optimized Macaron‑V1‑Preview Model

SuanNi

Mar 17, 2026 · Artificial Intelligence

How Attention Residuals Boost Transformer Efficiency and Scale

The article presents the Attention Residuals architecture, explains how it replaces uniform residual addition with learned attention‑based aggregation, details full and block variants, engineering tricks for distributed training, and shows extensive scaling‑law experiments where the new design consistently improves validation loss and training efficiency across model sizes.

Attention ResidualsDeep LearningEfficient Training

0 likes · 13 min read

How Attention Residuals Boost Transformer Efficiency and Scale

AIWalker

Mar 15, 2025 · Artificial Intelligence

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

SANA 1.5 introduces an efficient model‑growth pipeline, depth‑pruning, and inference‑time scaling that reuse a 1.6 B‑parameter foundation to train a 4.8 B model with 8× lower memory, 60 % less training time, and GenEval scores that rival or surpass much larger diffusion models.

Efficient TrainingInference ScalingPruning

0 likes · 17 min read

How SANA 1.5 Lets Small Models Reach New Text‑to‑Image SOTA

Programmer DD

Apr 14, 2023 · Artificial Intelligence

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

Microsoft open‑sourced DeepSpeed‑Chat, a toolkit that streamlines the end‑to‑end training and inference of ChatGPT‑like large language models using RLHF, delivering up to fifteen‑fold speedups and dramatically lower costs, even on a single GPU.

ChatGPTDeepSpeedEfficient Training

0 likes · 8 min read

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

DataFunTalk

Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchEfficient TrainingLarge Language Models

0 likes · 27 min read

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey