Tag

Training Optimization

0 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
Apr 3, 2025 · Artificial Intelligence

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

This article explains how to dramatically speed up PyTorch model training using code optimizations, mixed‑precision, torch.compile, distributed data parallelism, and DeepSpeed, presenting benchmark results that show up to 11.5× acceleration on multiple GPUs while maintaining high accuracy.

DeepSpeedGPUMixed Precision
0 likes · 6 min read
Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code
DataFunSummit
DataFunSummit
Mar 3, 2025 · Artificial Intelligence

DeepSeek Open Source Week: Seven Core Technologies Reshaping Large‑Model Training

The DeepSeek open‑source week introduced seven breakthrough technologies—FlashMLA, DeepGEMM, DeepEP, DualPipe, EPLB, 3FS, and Smallpond—that together overhaul data flow, algorithmic complexity, hardware utilization, MoE communication, and resource balancing, dramatically improving large‑model training efficiency and lowering entry barriers for the AI industry.

AI hardwareDeepSeekLarge Models
0 likes · 17 min read
DeepSeek Open Source Week: Seven Core Technologies Reshaping Large‑Model Training
DataFunSummit
DataFunSummit
Nov 22, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec’s recommendation system architecture, detailing training and inference optimizations, embedding parallelism, CPU/GPU placement strategies, online learning pipelines, and network compression techniques that together improve scalability, latency, and cost efficiency.

EasyRecTraining Optimizationdistributed systems
0 likes · 15 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
Sohu Tech Products
Sohu Tech Products
Aug 28, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

EasyRec, Alibaba Cloud’s modular recommendation framework, unifies configurable data, embedding, dense, and output layers on MaxCompute, EMR, and DLC, and speeds training with deduplication, EmbeddingParallel sharding, lock‑free hash tables, GPU embeddings, and AMX BF16, while inference benefits from operator fusion, low‑precision AVX/AMX kernels, compact caches, batch merging, and network compression, enabling real‑time online learning and delivering higher recommendation quality at lower cost in e‑commerce.

Alibaba CloudEasyRecTraining Optimization
0 likes · 14 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
DataFunTalk
DataFunTalk
Aug 26, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec's recommendation system architecture, detailing training and inference optimizations, distributed deployment strategies, operator fusion techniques, online learning pipelines, and network-level improvements to enhance performance and scalability.

AITraining Optimizationdistributed systems
0 likes · 15 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
DataFunTalk
DataFunTalk
Aug 24, 2024 · Artificial Intelligence

Improving the Mathematical Reasoning Ability of Large Language Models: Overview, Mixed Instructions, Synthetic Data, and Training Optimization

This article presents a comprehensive approach to enhancing large language models' mathematical reasoning by reviewing model architectures, introducing mixed CoT‑PoT instructions, generating and filtering synthetic data, and applying multi‑stage training optimizations such as RFT, PPO, and DPO, with detailed experimental results and Q&A.

AILarge Language ModelsReward Model
0 likes · 16 min read
Improving the Mathematical Reasoning Ability of Large Language Models: Overview, Mixed Instructions, Synthetic Data, and Training Optimization
Alimama Tech
Alimama Tech
Dec 21, 2022 · Artificial Intelligence

GBA: Global Batch Gradients Aggregation for Search Advertising Training

GBA (Global Batch Gradients Aggregation) introduces a training mode that seamlessly switches between synchronous and asynchronous learning for search‑advertising models by keeping a constant global batch size, using token‑controlled gradient aggregation and staleness management to retain synchronous‑level accuracy while preserving asynchronous efficiency and eliminating manual hyperparameter tuning.

AlibabaGBASearch Advertising
0 likes · 15 min read
GBA: Global Batch Gradients Aggregation for Search Advertising Training