How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

Architect
Architect
Architect
How to Build a Mini ChatGPT on a Single GPU with MiniMind

Overview

MiniMind is an open‑source project that enables anyone with a modest GPU (e.g., NVIDIA RTX 3090) to train a miniature large‑language model (LLM) from scratch. The smallest model is only 25.8 MB, roughly 1/7000 the size of GPT‑3, yet retains reasonable performance.

Key Highlights

Ultra‑lightweight design : Model size 25.8 MB, runnable on consumer GPUs.

Full training pipeline : Includes data cleaning, pre‑training, supervised fine‑tuning (SFT), LoRA fine‑tuning, DPO reinforcement learning, and model distillation, all implemented with native PyTorch.

Multimodal support : MiniMind‑V extension adds image understanding capabilities.

Quick Start

Clone the repository

git clone https://github.com/jingyaogong/minimind.git

Test an existing model

Install dependencies:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Download a pre‑trained MiniMind model from Hugging Face:

git clone https://huggingface.co/jingyaogong/MiniMind2

Run evaluation: python eval_model.py --load 1 --model_mode 2 where --load 1 loads the Hugging Face checkpoint and --model_mode 2 selects MiniMind2.

Launch the web UI (optional):

pip install streamlit  # if not installed
cd scripts
streamlit run web_demo.py

Training from Scratch

Environment preparation

Install dependencies as above.

Verify CUDA support:

import torch
print(torch.cuda.is_available())

Dataset preparation Place required datasets in ./dataset . The project provides a list of files such as pretrain_hq.jsonl (1.6 GB), sft_mini_512.jsonl , dpo.jsonl , etc. Links are available on ModelScope and Hugging Face.

Pre‑training Run the pre‑training script to learn basic language knowledge: python train_pretrain.py Outputs files named pretrain_*.pth .

Supervised fine‑tuning (SFT) Fine‑tune the model on dialogue data: python train_full_sft.py Generates full_sft_*.pth checkpoints.

Model evaluation After training, place the checkpoint in ./out/ and run:

python eval_model.py --model_mode 1  # evaluate fine‑tuned model

Use --model_mode 0 to evaluate the pre‑trained model.

Tokenizer

MiniMind uses a simplified tokenizer to keep the vocabulary small and reduce compute. Users can train a custom tokenizer or adopt the provided one.

Datasets

The project bundles several datasets for different training stages: dpo.jsonl – RLHF data. lora_identity.jsonl – Self‑identity prompts for LoRA training. lora_medical.jsonl – Medical Q&A data. pretrain_hq.jsonl – High‑quality pre‑training corpus (1.6 GB). r1_mix_1024.jsonl – Distilled DeepSeek‑R1 data (max length 1024). sft_1024.jsonl, sft_2048.jsonl, sft_512.jsonl – Supervised fine‑tuning data with varying sequence lengths. sft_mini_512.jsonl – Compact SFT dataset for rapid training.

Model Architecture and Training Strategies

MiniMind adopts a decoder‑only Transformer architecture similar to GPT‑3, with two notable innovations:

Rotary Position Embedding (RoPE) to handle long contexts efficiently.

Mixture‑of‑Experts (MoE) layer that improves computational efficiency and scalability.

Relevant architecture diagrams are included in the original repository.

Conclusion

MiniMind dramatically lowers the barrier to LLM research and development, allowing developers, enterprises, and researchers to train a functional mini‑ChatGPT on affordable hardware and customize it for various downstream tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMTransformeropen sourcePyTorchMiniMind
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.