Tagged articles
22 articles
Page 1 of 1
Architects' Tech Alliance
Architects' Tech Alliance
Apr 29, 2026 · Artificial Intelligence

DeepSeek V4: Open‑Source Bombshell That Shakes Closed‑Source AI Giants

DeepSeek V4’s preview launch unveils two open‑source LLM variants—V4‑Pro with 1.6 T parameters and V4‑Flash with 284 B—both supporting a default 1 M‑token context, and introduces novel mHC residual scheduling, hybrid CSA/HCA sparse attention, and Muon optimizer tricks that together deliver top‑tier performance rivaling closed‑source models across coding, long‑text, and reasoning benchmarks.

DeepSeekTraining Optimizationarchitecture
0 likes · 10 min read
DeepSeek V4: Open‑Source Bombshell That Shakes Closed‑Source AI Giants
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Nov 4, 2025 · Artificial Intelligence

Common Debugging Signals for Large Language Models

This article outlines the end‑to‑end workflow for large‑model training, highlights typical debugging challenges such as memory OOM, performance bottlenecks, and gradient issues, and provides concrete strategies, tools (DeepSpeed, Megatron, Torchtitan, veScale) and best‑practice checklists to help engineers diagnose and resolve problems efficiently.

DebuggingDeepSpeedLLM
0 likes · 12 min read
Common Debugging Signals for Large Language Models
DataFunSummit
DataFunSummit
Oct 8, 2025 · Artificial Intelligence

How EasyRec Boosts Recommendation Training and Inference Performance

This article explains the EasyRec recommendation system’s training and inference architecture, detailing optimization techniques such as embedding parallelism, CPU/GPU placement, XLA and TRT fusion, online learning pipelines, network compression, and real‑world deployment results that dramatically improve throughput and latency.

AI InfrastructureEasyRecInference Optimization
0 likes · 15 min read
How EasyRec Boosts Recommendation Training and Inference Performance
DataFunSummit
DataFunSummit
Sep 18, 2025 · Artificial Intelligence

Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies

This presentation by Yao Yitong of China Telecom AI Research Institute explains why Function Call is essential for LLM deployment, outlines data‑centric and training‑centric optimization methods, discusses common pitfalls and reward‑function design for reinforcement learning, and showcases practical Agent application patterns for real‑world tasks.

AgentLLMTraining Optimization
0 likes · 36 min read
Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies
DataFunSummit
DataFunSummit
Jul 5, 2025 · Artificial Intelligence

Boosting Large Model Training: Optimizing Performance with the Verl Framework

Join the DataFun Summit 2025 on July 12 to hear Tencent FinTech senior researcher Gong Dihong discuss how redesigning the Verl training system, integrating Megatron and Sglang, and applying new synchronization and offloading techniques dramatically speeds up large‑model reinforcement‑learning training.

AI PerformanceMegatronTraining Optimization
0 likes · 4 min read
Boosting Large Model Training: Optimizing Performance with the Verl Framework
JD Tech
JD Tech
May 15, 2025 · Artificial Intelligence

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

The paper "Omniforce" from JD Exploration Research Institute presents a cloud‑edge collaborative AutoML system that uses model distillation, data governance, Bayesian training optimization, and cloud‑edge cooperation to reduce large‑model training costs by 70% and improve inference efficiency by an average of 30%, offering a reusable technical paradigm for scalable AI deployment.

AI efficiencyJoyBuildLarge Model
0 likes · 6 min read
How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%
AI Algorithm Path
AI Algorithm Path
Mar 16, 2025 · Artificial Intelligence

Speed Up Your PyTorch Model Training: Practical Tips and Tricks

This article walks through concrete techniques to accelerate PyTorch training, covering mixed‑precision with torch.cuda.amp, profiling with torch.profiler, DataLoader tuning, torch.compile, distributed strategies like DataParallel and DDP, gradient accumulation, and advanced libraries such as Lightning, Apex, and DeepSpeed, plus model‑level optimizations and monitoring tips.

DataLoaderDistributed TrainingProfiling
0 likes · 12 min read
Speed Up Your PyTorch Model Training: Practical Tips and Tricks
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 3, 2025 · Artificial Intelligence

How DeepSeek-V3 Achieves Massive Scale with FP8, MoE, and System Optimizations

The article examines DeepSeek‑V3’s architecture and training pipeline, highlighting its use of MLA and a highly granular MoE design, pioneering FP8 mixed‑precision training, fine‑grained per‑tile quantization, advanced parallelism strategies, and inference optimizations such as PD separation and NanoFlow to achieve unprecedented efficiency on limited GPU resources.

DeepSeek-V3FP8Inference Optimization
0 likes · 10 min read
How DeepSeek-V3 Achieves Massive Scale with FP8, MoE, and System Optimizations
DataFunSummit
DataFunSummit
Nov 22, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec’s recommendation system architecture, detailing training and inference optimizations, embedding parallelism, CPU/GPU placement strategies, online learning pipelines, and network compression techniques that together improve scalability, latency, and cost efficiency.

Distributed SystemsEasyRecInference Optimization
0 likes · 15 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMModel architectureTraining Optimization
0 likes · 18 min read
How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned
NewBeeNLP
NewBeeNLP
Sep 2, 2024 · Artificial Intelligence

Boosting Large Language Model Math Reasoning: Mixed Instructions, Synthetic Data, and Training Optimizations

This article presents a comprehensive technical walkthrough on enhancing large language model mathematical reasoning by reviewing model architectures, introducing mixed CoT‑PoT instructions, generating and filtering synthetic data, and applying multi‑stage training optimizations such as RFT, PPO, and DPO, with detailed experimental results and Q&A insights.

AIReward modelTraining Optimization
0 likes · 17 min read
Boosting Large Language Model Math Reasoning: Mixed Instructions, Synthetic Data, and Training Optimizations
Sohu Tech Products
Sohu Tech Products
Aug 28, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

EasyRec, Alibaba Cloud’s modular recommendation framework, unifies configurable data, embedding, dense, and output layers on MaxCompute, EMR, and DLC, and speeds training with deduplication, EmbeddingParallel sharding, lock‑free hash tables, GPU embeddings, and AMX BF16, while inference benefits from operator fusion, low‑precision AVX/AMX kernels, compact caches, batch merging, and network compression, enabling real‑time online learning and delivering higher recommendation quality at lower cost in e‑commerce.

Alibaba CloudEasyRecInference Optimization
0 likes · 14 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
DataFunTalk
DataFunTalk
Aug 26, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec's recommendation system architecture, detailing training and inference optimizations, distributed deployment strategies, operator fusion techniques, online learning pipelines, and network-level improvements to enhance performance and scalability.

AIInference OptimizationTraining Optimization
0 likes · 15 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 26, 2024 · Artificial Intelligence

MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training

This article reviews the evolution of Mixture-of-Experts (MoE) models, details Alibaba Cloud’s collaboration with NVIDIA’s Megatron-Core to build a high-performance MoE framework, and presents extensive training optimizations, benchmark results, conversion tools, and best-practice guidelines for large-scale LLM development and deployment.

Alibaba CloudMegatron-CoreMoE
0 likes · 18 min read
MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training
Alimama Tech
Alimama Tech
Dec 21, 2022 · Artificial Intelligence

GBA: Global Batch Gradients Aggregation for Search Advertising Training

GBA (Global Batch Gradients Aggregation) introduces a training mode that seamlessly switches between synchronous and asynchronous learning for search‑advertising models by keeping a constant global batch size, using token‑controlled gradient aggregation and staleness management to retain synchronous‑level accuracy while preserving asynchronous efficiency and eliminating manual hyperparameter tuning.

AlibabaGBATraining Optimization
0 likes · 15 min read
GBA: Global Batch Gradients Aggregation for Search Advertising Training
Volcano Engine Developer Services
Volcano Engine Developer Services
Jun 20, 2022 · Big Data

How ByteDance Scaled Feature Storage with Iceberg and Parquet: A Big Data Case Study

ByteDance tackled massive feature‑storage challenges by replacing row‑based HDFS files with columnar Parquet and the Iceberg table format, enabling schema evolution, selective reads, efficient backfill, and training optimizations that cut storage costs by over 40% and reduced CPU and network I/O dramatically.

Big DataData LakeIceberg
0 likes · 13 min read
How ByteDance Scaled Feature Storage with Iceberg and Parquet: A Big Data Case Study