Tag

DeepSpeed

0 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
Apr 3, 2025 · Artificial Intelligence

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

This article explains how to dramatically speed up PyTorch model training using code optimizations, mixed‑precision, torch.compile, distributed data parallelism, and DeepSpeed, presenting benchmark results that show up to 11.5× acceleration on multiple GPUs while maintaining high accuracy.

DeepSpeedGPUMixed Precision
0 likes · 6 min read
Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code
DataFunSummit
DataFunSummit
Jan 6, 2025 · Artificial Intelligence

Efficient Large‑Model Training with LLaMA‑Factory: Overview, Techniques, and Applications

This article explains how to train large language models efficiently using LLaMA‑Factory, covering low‑resource training challenges, memory‑saving optimizations for parameters, gradients and activations, framework features, quick‑start guidance, performance tuning, real‑world case studies, and a detailed Q&A.

AIDeepSpeedLLaMA-Factory
0 likes · 10 min read
Efficient Large‑Model Training with LLaMA‑Factory: Overview, Techniques, and Applications
DataFunTalk
DataFunTalk
Jul 8, 2024 · Artificial Intelligence

Challenges and Techniques for Distributed Training of Large Language Models

This article discusses the historical background, major challenges such as massive compute and memory demands, and the technical ecosystem—including data parallelism, pipeline parallelism, and optimization strategies like DeepSpeed and 1F1B—to enable efficient distributed training of large language models.

AI infrastructureDeepSpeeddistributed training
0 likes · 22 min read
Challenges and Techniques for Distributed Training of Large Language Models
360 Tech Engineering
360 Tech Engineering
Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

This article explains the concept, motivations, and step‑by‑step workflow for fine‑tuning large language models—specifically Qwen‑14B—covering data preparation, training commands with DeepSpeed, hyper‑parameter settings, evaluation, and deployment via FastChat, all illustrated with code snippets and configuration details.

AIDeepSpeedFastChat
0 likes · 10 min read
Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform
360 Smart Cloud
360 Smart Cloud
Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide

This article provides a comprehensive tutorial on fine‑tuning the Qwen‑14B large language model, covering the motivation, fine‑tuning concepts, step‑by‑step workflow, required code, DeepSpeed training parameters, testing scripts, and deployment using FastChat and the 360AI platform.

AI model deploymentDeepSpeedFastChat
0 likes · 9 min read
Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide
DataFunSummit
DataFunSummit
Mar 31, 2024 · Artificial Intelligence

Challenges and Techniques in Distributed Training of Large Language Models

This article reviews the rapid development of large language models since 2019, outlines the historical background, identifies key challenges such as massive compute demand, memory constraints, and system complexity, and then details distributed training technologies—including data parallelism, pipeline parallelism, and advanced optimization strategies—while also discussing future research directions and answering common questions.

AI infrastructureData ParallelismDeepSpeed
0 likes · 23 min read
Challenges and Techniques in Distributed Training of Large Language Models
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Mar 22, 2024 · Artificial Intelligence

InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

This tutorial walks through fine‑tuning Shanghai AI Lab’s open‑source InternLM models with XTuner, explaining chat‑format conventions, loading and inference (including multimodal InternLM‑XComposer), dataset preparation, configuration sections, DeepSpeed acceleration, and memory‑efficient QLoRA details for 7‑B‑parameter chat models.

Chat FormatDeepSpeedFine-tuning
0 likes · 22 min read
InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide
Alimama Tech
Alimama Tech
Sep 12, 2023 · Artificial Intelligence

Megatron-LLaMA: High-Performance Large Language Model Training Framework

Megatron-LLaMA is an open‑source high‑performance training framework for LLaMA models, offering tensor, pipeline, and sequence parallelism, an overlapped optimizer, and near‑linear scalability, achieving up to 176% speedup on 32 GPUs and robust performance even with limited network bandwidth.

DeepSpeedGPU optimizationLlama
0 likes · 10 min read
Megatron-LLaMA: High-Performance Large Language Model Training Framework
IT Architects Alliance
IT Architects Alliance
Apr 17, 2023 · Artificial Intelligence

DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models

DeepSpeed Chat provides a fast, affordable, and scalable system for end‑to‑end RLHF training of ChatGPT‑style large language models, offering one‑click scripts, detailed performance benchmarks across GPU configurations, support for many model families, and a flexible API for custom RLHF pipelines.

ChatGPTDeepSpeedGPU training
0 likes · 14 min read
DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2022 · Artificial Intelligence

Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server

This article presents a detailed benchmark of four Transformer models of varying sizes trained on the high‑end Inspur NF5488A5 GPU server, compares its NVSwitch‑based interconnect with a PCIe‑based system, and analyzes the impact of model scale, tensor parallelism, and hardware bandwidth on training efficiency.

DeepSpeedGPU serverMegatron-LM
0 likes · 12 min read
Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server
DataFunSummit
DataFunSummit
Apr 19, 2022 · Artificial Intelligence

DeepSpeed‑MoE: End‑to‑End Training and Inference Solutions for Mixture‑of‑Experts Models

This article reviews DeepSpeed‑MoE, an end‑to‑end system that introduces new MoE architectures, model‑compression techniques, and highly optimized inference pipelines, detailing its motivation, design of PR‑MoE (Pyramid‑MoE and Residual‑MoE), distributed parallel strategies, communication and kernel optimizations, and performance gains over dense baselines.

AIDeepSpeedMixture of Experts
0 likes · 11 min read
DeepSpeed‑MoE: End‑to‑End Training and Inference Solutions for Mixture‑of‑Experts Models