Tagged articles

Training Acceleration

10 articles · Page 1 of 1

May 14, 2026 · Artificial Intelligence

Accelerating Training and Inference of EAGLE-3 for Multi‑Round Agent Workflows

This article analyzes the latency bottlenecks of large language models in multi‑round AI Agent scenarios, introduces SpecForge‑based speculative decoding and Unified Sequence Parallelism (USP) techniques applied to the EAGLE-3 model, and presents benchmark results showing over two‑fold Accept‑Len gains and 35‑44% reductions in P95 token‑level latency while enabling 128K context training on an 8‑GPU node.

Agent AIEAGLE-3Training Acceleration

0 likes · 26 min read

Accelerating Training and Inference of EAGLE-3 for Multi‑Round Agent Workflows

AI Explorer

Apr 16, 2026 · Artificial Intelligence

How NVIDIA, HKU, and MIT’s Sol‑RL Framework Supercharges Diffusion Model Training

NVIDIA, Hong Kong University, and MIT introduced the Sol‑RL framework, which uses reinforcement‑learning‑guided sampling to cut diffusion model training time by several‑fold without sacrificing image quality, potentially lowering entry barriers for small teams and shifting the AIGC industry toward an efficiency‑driven competition.

AIGCDiffusion ModelsNVIDIA

0 likes · 6 min read

How NVIDIA, HKU, and MIT’s Sol‑RL Framework Supercharges Diffusion Model Training

Machine Learning Algorithms & Natural Language Processing

Feb 26, 2026 · Artificial Intelligence

How MiniMax’s Forge Architecture Achieves 40× Faster Agent RL Training

The article details MiniMax’s Forge system, an asynchronous native Agent‑RL architecture that standardizes Agent‑LLM interaction, introduces engineering optimizations, novel scheduling, prefix‑tree merging and reward designs, enabling million‑sample daily throughput, stable reward growth and up to 40‑fold training acceleration for the MiniMax M2.5 model.

Mixed SchedulingScalable SystemsTraining Acceleration

0 likes · 17 min read

How MiniMax’s Forge Architecture Achieves 40× Faster Agent RL Training

Alibaba Cloud Developer

Dec 18, 2025 · Artificial Intelligence

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

This technical report details the end‑to‑end workflow for integrating an AIGC video generation module into a social app, covering requirement analysis, model and hardware selection, dataset construction, LoRA and full‑parameter training, multiple acceleration techniques such as Sage Attention, TeaCache, XDiT, gradient‑checkpointing offload, tiled VAE, and quantization, followed by extensive performance evaluation and metric‑based ranking of the final models.

AI video generationDiffusion ModelsLoRA fine-tuning

0 likes · 38 min read

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

Shopee Tech Team

Oct 14, 2025 · Artificial Intelligence

How SPEC‑RL Boosts On‑Policy Reinforcement Learning Speed by Up to 3×

SPEC‑RL introduces speculative rollouts that reuse verified historical rollouts as prefixes, cutting rollout time by 2–3× while maintaining or improving performance across various math and reasoning benchmarks, and works seamlessly with PPO, GRPO, DAPO and other on‑policy algorithms.

AI efficiencyTraining Accelerationlarge language models

0 likes · 8 min read

How SPEC‑RL Boosts On‑Policy Reinforcement Learning Speed by Up to 3×

360 Tech Engineering

Oct 15, 2024 · Artificial Intelligence

Implementation and Optimization of 360 AI Compute Center: Infrastructure, Network, Kubernetes, and Training/Inference Acceleration

The article details the design and deployment of 360's AI Compute Center, covering GPU server selection, high‑performance networking, Kubernetes‑based cluster management, advanced scheduling, training and inference acceleration techniques, and a comprehensive AI development platform with visualization and fault‑tolerance features.

AI InfrastructureDistributed ComputingGPU Cluster

0 likes · 21 min read

Implementation and Optimization of 360 AI Compute Center: Infrastructure, Network, Kubernetes, and Training/Inference Acceleration

58 Tech

Jun 3, 2024 · Artificial Intelligence

Parameter-Efficient Fine-Tuning (PEFT) Methods for Large Language Models: LoRA, QLoRA, AdaLoRA, SoRA, and Training Acceleration with Unsloth

This article systematically analyzes popular parameter‑efficient fine‑tuning (PEFT) techniques for large language models—including Adapter Tuning, Prefix Tuning, LoRA, QLoRA, AdaLoRA, and SoRA—detailing their principles, implementation code, experimental results on NLU tasks, and practical acceleration using the Unsloth library.

AdaLoRALoRAPEFT

0 likes · 39 min read

Parameter-Efficient Fine-Tuning (PEFT) Methods for Large Language Models: LoRA, QLoRA, AdaLoRA, SoRA, and Training Acceleration with Unsloth

Baidu Intelligent Cloud Tech Hub

May 15, 2024 · Artificial Intelligence

How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference

The article explains the scaling challenges of ever‑larger LLMs, introduces the MFU performance metric, surveys industry parallelism and memory‑saving techniques, and details Baidu’s AIAK‑LLM suite—including resource, component and acceleration layers—as well as concrete training and inference optimizations that raise MFU by 30‑60% and cut deployment costs.

AI InfrastructureMFUMemory optimization

0 likes · 25 min read

How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference

Tencent Cloud Developer

May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI paintingCLIP embeddingTraining Acceleration

0 likes · 12 min read

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

Alibaba Cloud Big Data AI Platform

Mar 21, 2023 · Artificial Intelligence

How We Tripled CTR Model Training Speed in the Alibaba‑Intel DeepRec Challenge

The MetaSpore team detailed a three‑pronged optimization—sparse model tuning, training‑pipeline acceleration, and low‑level framework tweaks—that boosted DeepRec CTR model training efficiency by over three times without sacrificing AUC, securing first place in the global AI competition.

AI competitionCTRDeepRec

0 likes · 9 min read

How We Tripled CTR Model Training Speed in the Alibaba‑Intel DeepRec Challenge