Tagged articles
23 articles
Page 1 of 1
Qborfy AI
Qborfy AI
Mar 24, 2026 · Artificial Intelligence

Why Full Fine‑Tuning Beats LoRA: When and How to Update Every Model Parameter

This article explains full fine‑tuning—updating all parameters of a pretrained model—to achieve the highest task performance, compares it with LoRA and prompt tuning, shows when it is appropriate, provides a step‑by‑step Hugging Face implementation, memory‑saving tricks, common pitfalls, and practical takeaways.

Deep LearningDeepSpeedGPU Memory
0 likes · 9 min read
Why Full Fine‑Tuning Beats LoRA: When and How to Update Every Model Parameter
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Nov 4, 2025 · Artificial Intelligence

Common Debugging Signals for Large Language Models

This article outlines the end‑to‑end workflow for large‑model training, highlights typical debugging challenges such as memory OOM, performance bottlenecks, and gradient issues, and provides concrete strategies, tools (DeepSpeed, Megatron, Torchtitan, veScale) and best‑practice checklists to help engineers diagnose and resolve problems efficiently.

DebuggingDeepSpeedLLM
0 likes · 12 min read
Common Debugging Signals for Large Language Models
Fun with Large Models
Fun with Large Models
Aug 30, 2025 · Artificial Intelligence

How to Fine‑Tune Large Models on Multiple Nodes and GPUs – A Must‑Know Interview Answer

This article explains how to fine‑tune large models across multiple machines and GPUs by covering data, model, tensor, and pipeline parallelism, hybrid 3D parallel strategies, engineering details such as NCCL, PyTorch Distributed, DeepSpeed, fault‑tolerance, checkpointing, and the ZeRO optimizer stages that dramatically reduce memory usage.

Data ParallelDeepSpeedDistributed Training
0 likes · 8 min read
How to Fine‑Tune Large Models on Multiple Nodes and GPUs – A Must‑Know Interview Answer
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jul 13, 2025 · Artificial Intelligence

Getting Started with Hugging Face Transformers Trainer

This guide walks through the Hugging Face Transformers Trainer library, explaining its core features such as configurable training loops, mixed‑precision and gradient‑accumulation support, seamless distributed training via Accelerate and DeepSpeed, and provides a step‑by‑step example of converting a simple PyTorch CNN model to use Trainer.

AccelerateDeepSpeedDistributed Training
0 likes · 7 min read
Getting Started with Hugging Face Transformers Trainer
DataFunSummit
DataFunSummit
Jan 6, 2025 · Artificial Intelligence

Efficient Large‑Model Training with LLaMA‑Factory: Overview, Techniques, and Applications

This article explains how to train large language models efficiently using LLaMA‑Factory, covering low‑resource training challenges, memory‑saving optimizations for parameters, gradients and activations, framework features, quick‑start guidance, performance tuning, real‑world case studies, and a detailed Q&A.

AIDeepSpeedLLaMA-Factory
0 likes · 10 min read
Efficient Large‑Model Training with LLaMA‑Factory: Overview, Techniques, and Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 28, 2024 · Artificial Intelligence

Master Distributed Training for Massive AI Models on Multi‑GPU Clusters

This guide walks you through the fundamentals of distributed training for large AI models, explaining data, model, and pipeline parallelism, GPU communication primitives, and advanced techniques like Megatron 3‑D parallelism and DeepSpeed ZeRO stages, with practical examples and visual illustrations to help you design efficient multi‑GPU training pipelines.

Data ParallelismDeepSpeedDistributed Training
0 likes · 27 min read
Master Distributed Training for Massive AI Models on Multi‑GPU Clusters
DataFunTalk
DataFunTalk
Jul 8, 2024 · Artificial Intelligence

Challenges and Techniques for Distributed Training of Large Language Models

This article discusses the historical background, major challenges such as massive compute and memory demands, and the technical ecosystem—including data parallelism, pipeline parallelism, and optimization strategies like DeepSpeed and 1F1B—to enable efficient distributed training of large language models.

AI InfrastructureDeepSpeedPipeline Parallelism
0 likes · 22 min read
Challenges and Techniques for Distributed Training of Large Language Models
360 Tech Engineering
360 Tech Engineering
Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

This article explains the concept, motivations, and step‑by‑step workflow for fine‑tuning large language models—specifically Qwen‑14B—covering data preparation, training commands with DeepSpeed, hyper‑parameter settings, evaluation, and deployment via FastChat, all illustrated with code snippets and configuration details.

AIDeepSpeedFastChat
0 likes · 10 min read
Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform
360 Smart Cloud
360 Smart Cloud
Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide

This article provides a comprehensive tutorial on fine‑tuning the Qwen‑14B large language model, covering the motivation, fine‑tuning concepts, step‑by‑step workflow, required code, DeepSpeed training parameters, testing scripts, and deployment using FastChat and the 360AI platform.

AI Model DeploymentDeepSpeedFastChat
0 likes · 9 min read
Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide
DataFunSummit
DataFunSummit
Mar 31, 2024 · Artificial Intelligence

Challenges and Techniques in Distributed Training of Large Language Models

This article reviews the rapid development of large language models since 2019, outlines the historical background, identifies key challenges such as massive compute demand, memory constraints, and system complexity, and then details distributed training technologies—including data parallelism, pipeline parallelism, and advanced optimization strategies—while also discussing future research directions and answering common questions.

AI InfrastructureData ParallelismDeepSpeed
0 likes · 23 min read
Challenges and Techniques in Distributed Training of Large Language Models
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Mar 22, 2024 · Artificial Intelligence

InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

This tutorial walks through fine‑tuning Shanghai AI Lab’s open‑source InternLM models with XTuner, explaining chat‑format conventions, loading and inference (including multimodal InternLM‑XComposer), dataset preparation, configuration sections, DeepSpeed acceleration, and memory‑efficient QLoRA details for 7‑B‑parameter chat models.

Chat FormatDeepSpeedFine-tuning
0 likes · 22 min read
InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide
Alimama Tech
Alimama Tech
Sep 12, 2023 · Artificial Intelligence

Megatron-LLaMA: High-Performance Large Language Model Training Framework

Megatron-LLaMA is an open‑source high‑performance training framework for LLaMA models, offering tensor, pipeline, and sequence parallelism, an overlapped optimizer, and near‑linear scalability, achieving up to 176% speedup on 32 GPUs and robust performance even with limited network bandwidth.

DeepSpeedDistributed TrainingGPU Optimization
0 likes · 10 min read
Megatron-LLaMA: High-Performance Large Language Model Training Framework
IT Architects Alliance
IT Architects Alliance
Apr 17, 2023 · Artificial Intelligence

DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models

DeepSpeed Chat provides a fast, affordable, and scalable system for end‑to‑end RLHF training of ChatGPT‑style large language models, offering one‑click scripts, detailed performance benchmarks across GPU configurations, support for many model families, and a flexible API for custom RLHF pipelines.

ChatGPTDeepSpeedGPU training
0 likes · 14 min read
DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models
Programmer DD
Programmer DD
Apr 14, 2023 · Artificial Intelligence

How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×

Microsoft open‑sourced DeepSpeed‑Chat, a toolkit that streamlines the end‑to‑end training and inference of ChatGPT‑like large language models using RLHF, delivering up to fifteen‑fold speedups and dramatically lower costs, even on a single GPU.

ChatGPTDeepSpeedRLHF
0 likes · 8 min read
How DeepSpeed-Chat Accelerates ChatGPT‑Style Model Training by 15×
21CTO
21CTO
Apr 13, 2023 · Artificial Intelligence

How Microsoft’s Open‑Source DeepSpeed‑Chat Accelerates LLM Training by 15×

Microsoft has open‑sourced DeepSpeed‑Chat, a DeepSpeed‑based framework that simplifies end‑to‑end training and inference of ChatGPT‑style large language models, offering RL‑HF support, up to 15× speed‑up, massive cost reductions, and scalable performance on Azure for models ranging from billions to hundreds of billions of parameters.

AIDeepSpeedLLM training
0 likes · 7 min read
How Microsoft’s Open‑Source DeepSpeed‑Chat Accelerates LLM Training by 15×
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2022 · Artificial Intelligence

Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server

This article presents a detailed benchmark of four Transformer models of varying sizes trained on the high‑end Inspur NF5488A5 GPU server, compares its NVSwitch‑based interconnect with a PCIe‑based system, and analyzes the impact of model scale, tensor parallelism, and hardware bandwidth on training efficiency.

DeepSpeedGPU serverMegatron-LM
0 likes · 12 min read
Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server
DataFunSummit
DataFunSummit
Apr 19, 2022 · Artificial Intelligence

DeepSpeed‑MoE: End‑to‑End Training and Inference Solutions for Mixture‑of‑Experts Models

This article reviews DeepSpeed‑MoE, an end‑to‑end system that introduces new MoE architectures, model‑compression techniques, and highly optimized inference pipelines, detailing its motivation, design of PR‑MoE (Pyramid‑MoE and Residual‑MoE), distributed parallel strategies, communication and kernel optimizations, and performance gains over dense baselines.

AIDeepSpeedInference Optimization
0 likes · 11 min read
DeepSpeed‑MoE: End‑to‑End Training and Inference Solutions for Mixture‑of‑Experts Models