Tagged articles

22 articles

Page 1 of 1

Apr 29, 2026 · Artificial Intelligence

DeepSeek V4: Open‑Source Bombshell That Shakes Closed‑Source AI Giants

DeepSeek V4’s preview launch unveils two open‑source LLM variants—V4‑Pro with 1.6 T parameters and V4‑Flash with 284 B—both supporting a default 1 M‑token context, and introduces novel mHC residual scheduling, hybrid CSA/HCA sparse attention, and Muon optimizer tricks that together deliver top‑tier performance rivaling closed‑source models across coding, long‑text, and reasoning benchmarks.

DeepSeekTraining Optimizationarchitecture

0 likes · 10 min read

DeepSeek V4: Open‑Source Bombshell That Shakes Closed‑Source AI Giants

AI2ML AI to Machine Learning

Nov 4, 2025 · Artificial Intelligence

Common Debugging Signals for Large Language Models

This article outlines the end‑to‑end workflow for large‑model training, highlights typical debugging challenges such as memory OOM, performance bottlenecks, and gradient issues, and provides concrete strategies, tools (DeepSpeed, Megatron, Torchtitan, veScale) and best‑practice checklists to help engineers diagnose and resolve problems efficiently.

DebuggingDeepSpeedLLM

0 likes · 12 min read

Common Debugging Signals for Large Language Models

DataFunSummit

Oct 8, 2025 · Artificial Intelligence

How EasyRec Boosts Recommendation Training and Inference Performance

This article explains the EasyRec recommendation system’s training and inference architecture, detailing optimization techniques such as embedding parallelism, CPU/GPU placement, XLA and TRT fusion, online learning pipelines, network compression, and real‑world deployment results that dramatically improve throughput and latency.

AI InfrastructureEasyRecInference Optimization

0 likes · 15 min read

How EasyRec Boosts Recommendation Training and Inference Performance

DataFunSummit

Sep 18, 2025 · Artificial Intelligence

Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies

This presentation by Yao Yitong of China Telecom AI Research Institute explains why Function Call is essential for LLM deployment, outlines data‑centric and training‑centric optimization methods, discusses common pitfalls and reward‑function design for reinforcement learning, and showcases practical Agent application patterns for real‑world tasks.

AgentLLMTraining Optimization

0 likes · 36 min read

Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies

DataFunSummit

Jul 5, 2025 · Artificial Intelligence

Boosting Large Model Training: Optimizing Performance with the Verl Framework

Join the DataFun Summit 2025 on July 12 to hear Tencent FinTech senior researcher Gong Dihong discuss how redesigning the Verl training system, integrating Megatron and Sglang, and applying new synchronization and offloading techniques dramatically speeds up large‑model reinforcement‑learning training.

AI PerformanceMegatronTraining Optimization

0 likes · 4 min read

Boosting Large Model Training: Optimizing Performance with the Verl Framework

DataFunSummit

Jul 4, 2025 · Artificial Intelligence

How EasyRec Boosts Recommendation Performance: Training, Inference, and Online Learning Optimizations

This article explains the EasyRec recommendation system's training and inference architecture, details a series of optimizations for both CPU and GPU pipelines, and describes the online learning workflow that enables real‑time model updates across large‑scale e‑commerce scenarios.

AIInference OptimizationOnline Learning

0 likes · 16 min read

How EasyRec Boosts Recommendation Performance: Training, Inference, and Online Learning Optimizations

DataFunSummit

Jul 3, 2025 · Artificial Intelligence

Boosting LLM Function Call Capabilities: From Data Construction to RLHF Optimization

On July 12, 2025, the DataFun Summit will feature a technical session where China Telecom AI Research Institute engineer Yao Yitong presents a deep dive into enhancing large language model Function Call abilities through systematic data and training optimizations, offering practical insights for AI practitioners.

AILLMRLHF

0 likes · 4 min read

Boosting LLM Function Call Capabilities: From Data Construction to RLHF Optimization

DataFunSummit

Jun 20, 2025 · Artificial Intelligence

EasyRec Deep Dive: Training & Inference Architecture, Optimizations, and Online Learning

This article explains EasyRec's end‑to‑end recommendation system, covering its training‑inference architecture, a series of CPU/GPU and distributed optimizations, and a real‑time online‑learning pipeline that together improve throughput, latency, and model freshness.

AI InfrastructureInference OptimizationOnline Learning

0 likes · 15 min read

EasyRec Deep Dive: Training & Inference Architecture, Optimizations, and Online Learning

AI Frontier Lectures

Jun 3, 2025 · Artificial Intelligence

Master LLM Engineering: Model Conversion, Parallel Inference, and Channel‑Loss Techniques

This article outlines essential LLM engineering skills, including scripts for converting various model checkpoints to Llama format, customizing modeling files for advanced features, building a multi‑GPU inference class, and adding channel‑aware loss tracking to fine‑tuning pipelines.

Flash AttentionLLMTraining Optimization

0 likes · 6 min read

Master LLM Engineering: Model Conversion, Parallel Inference, and Channel‑Loss Techniques

JD Tech

May 15, 2025 · Artificial Intelligence

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

The paper "Omniforce" from JD Exploration Research Institute presents a cloud‑edge collaborative AutoML system that uses model distillation, data governance, Bayesian training optimization, and cloud‑edge cooperation to reduce large‑model training costs by 70% and improve inference efficiency by an average of 30%, offering a reusable technical paradigm for scalable AI deployment.

AI efficiencyJoyBuildLarge Model

0 likes · 6 min read

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

Python Programming Learning Circle

Apr 3, 2025 · Artificial Intelligence

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

This article explains how to dramatically speed up PyTorch model training using code optimizations, mixed‑precision, torch.compile, distributed data parallelism, and DeepSpeed, presenting benchmark results that show up to 11.5× acceleration on multiple GPUs while maintaining high accuracy.

Deep LearningDeepSpeedDistributed Training

0 likes · 6 min read

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

AI Algorithm Path

Mar 16, 2025 · Artificial Intelligence

Speed Up Your PyTorch Model Training: Practical Tips and Tricks

This article walks through concrete techniques to accelerate PyTorch training, covering mixed‑precision with torch.cuda.amp, profiling with torch.profiler, DataLoader tuning, torch.compile, distributed strategies like DataParallel and DDP, gradient accumulation, and advanced libraries such as Lightning, Apex, and DeepSpeed, plus model‑level optimizations and monitoring tips.

DataLoaderDistributed TrainingProfiling

0 likes · 12 min read

Speed Up Your PyTorch Model Training: Practical Tips and Tricks

Baobao Algorithm Notes

Jan 3, 2025 · Artificial Intelligence

How DeepSeek-V3 Achieves Massive Scale with FP8, MoE, and System Optimizations

The article examines DeepSeek‑V3’s architecture and training pipeline, highlighting its use of MLA and a highly granular MoE design, pioneering FP8 mixed‑precision training, fine‑grained per‑tile quantization, advanced parallelism strategies, and inference optimizations such as PD separation and NanoFlow to achieve unprecedented efficiency on limited GPU resources.

DeepSeek-V3FP8Inference Optimization

0 likes · 10 min read

How DeepSeek-V3 Achieves Massive Scale with FP8, MoE, and System Optimizations

DataFunSummit

Nov 22, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec’s recommendation system architecture, detailing training and inference optimizations, embedding parallelism, CPU/GPU placement strategies, online learning pipelines, and network compression techniques that together improve scalability, latency, and cost efficiency.

Distributed SystemsEasyRecInference Optimization

0 likes · 15 min read

EasyRec Recommendation Algorithm Training and Inference Optimization

Baobao Algorithm Notes

Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMModel architectureTraining Optimization

0 likes · 18 min read

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

NewBeeNLP

Sep 2, 2024 · Artificial Intelligence

Boosting Large Language Model Math Reasoning: Mixed Instructions, Synthetic Data, and Training Optimizations

This article presents a comprehensive technical walkthrough on enhancing large language model mathematical reasoning by reviewing model architectures, introducing mixed CoT‑PoT instructions, generating and filtering synthetic data, and applying multi‑stage training optimizations such as RFT, PPO, and DPO, with detailed experimental results and Q&A insights.

AIReward modelTraining Optimization

0 likes · 17 min read

Boosting Large Language Model Math Reasoning: Mixed Instructions, Synthetic Data, and Training Optimizations

Sohu Tech Products

Aug 28, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

EasyRec, Alibaba Cloud’s modular recommendation framework, unifies configurable data, embedding, dense, and output layers on MaxCompute, EMR, and DLC, and speeds training with deduplication, EmbeddingParallel sharding, lock‑free hash tables, GPU embeddings, and AMX BF16, while inference benefits from operator fusion, low‑precision AVX/AMX kernels, compact caches, batch merging, and network compression, enabling real‑time online learning and delivering higher recommendation quality at lower cost in e‑commerce.

Alibaba CloudEasyRecInference Optimization

0 likes · 14 min read

DataFunTalk

Aug 26, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec's recommendation system architecture, detailing training and inference optimizations, distributed deployment strategies, operator fusion techniques, online learning pipelines, and network-level improvements to enhance performance and scalability.

AIInference OptimizationTraining Optimization

0 likes · 15 min read

Alibaba Cloud Big Data AI Platform

Mar 26, 2024 · Artificial Intelligence

MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training

This article reviews the evolution of Mixture-of-Experts (MoE) models, details Alibaba Cloud’s collaboration with NVIDIA’s Megatron-Core to build a high-performance MoE framework, and presents extensive training optimizations, benchmark results, conversion tools, and best-practice guidelines for large-scale LLM development and deployment.

Alibaba CloudMegatron-CoreMoE

0 likes · 18 min read

MoE LLMs: How Alibaba Cloud & NVIDIA Megatron-Core Accelerate Training

Alimama Tech

Dec 21, 2022 · Artificial Intelligence

GBA: Global Batch Gradients Aggregation for Search Advertising Training

GBA (Global Batch Gradients Aggregation) introduces a training mode that seamlessly switches between synchronous and asynchronous learning for search‑advertising models by keeping a constant global batch size, using token‑controlled gradient aggregation and staleness management to retain synchronous‑level accuracy while preserving asynchronous efficiency and eliminating manual hyperparameter tuning.

AlibabaGBATraining Optimization

0 likes · 15 min read

GBA: Global Batch Gradients Aggregation for Search Advertising Training

Volcano Engine Developer Services

Jun 20, 2022 · Big Data

How ByteDance Scaled Feature Storage with Iceberg and Parquet: A Big Data Case Study

ByteDance tackled massive feature‑storage challenges by replacing row‑based HDFS files with columnar Parquet and the Iceberg table format, enabling schema evolution, selective reads, efficient backfill, and training optimizations that cut storage costs by over 40% and reduced CPU and network I/O dramatically.

Big DataData LakeIceberg

0 likes · 13 min read

How ByteDance Scaled Feature Storage with Iceberg and Parquet: A Big Data Case Study

Hulu Beijing

Jan 25, 2018 · Artificial Intelligence

How Batch Normalization Accelerates Neural Network Training and Improves Generalization

This article explains the motivation, core principles, and implementation details of Batch Normalization, including how it normalizes each mini‑batch, restores learned feature distributions, and is applied in convolutional neural networks to speed up training and boost model generalization.

Batch NormalizationCNNDeep Learning

0 likes · 6 min read

How Batch Normalization Accelerates Neural Network Training and Improves Generalization