How Microsoft’s Open‑Source DeepSpeed‑Chat Accelerates LLM Training by 15×
Microsoft has open‑sourced DeepSpeed‑Chat, a DeepSpeed‑based framework that simplifies end‑to‑end training and inference of ChatGPT‑style large language models, offering RL‑HF support, up to 15× speed‑up, massive cost reductions, and scalable performance on Azure for models ranging from billions to hundreds of billions of parameters.
Microsoft announced the open‑source release of DeepSpeed Chat , a framework built on the DeepSpeed deep‑learning optimization library that enables easy and efficient training of ChatGPT‑style large language models.
GitHub repository: https://github.com/microsoft/DeepSpeed
DeepSpeed Chat incorporates training, reinforcement learning with human feedback (RLHF), and inference capabilities, achieving more than a 15× speed increase and significantly lowering training costs.
Why open source? The rapid adoption of ChatGPT has created high demand for accessible LLM training tools, but existing open‑source projects lack a complete end‑to‑end RLHF system, making large‑scale model training difficult for most researchers and small enterprises.
DeepSpeed Chat Overview
The framework allows users to train a custom ChatGPT‑like model with a single script that handles pre‑training, supervised fine‑tuning, reward‑model fine‑tuning, and RLHF, and provides an easy‑to‑use inference API.
It offers three core capabilities:
Simplified training and inference experience for ChatGPT‑style models using a single script.
DeepSpeed‑RLHF module that reproduces the InstructGPT pipeline, including supervised fine‑tuning (SFT), reward‑model fine‑tuning, and RLHF, with support for mixed‑data training.
DeepSpeed‑RLHF system that integrates training and inference in a unified engine, leveraging tensor parallelism, high‑performance transformer kernels, and memory‑optimisation strategies such as ZeRO and LoRA.
Performance benchmarks show DeepSpeed‑HE can train OPT‑13B in about 9 hours for under $300 and OPT‑30B in about 18 hours for under $600 on Azure. It scales to models with hundreds of billions of parameters, training a 13B model in 1.25 hours and a 175B model in less than a day.
Even with a single GPU, DeepSpeed‑HE can train models exceeding 13 billion parameters, enabling data scientists without multi‑GPU clusters to build powerful RLHF models.
The end‑to‑end training flow of DeepSpeed Chat is illustrated below:
In summary, Microsoft’s open‑source DeepSpeed Chat removes major bottlenecks in LLM training, delivering cost‑effective, high‑performance, and scalable solutions that empower the broader AI community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
