Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives
This article provides a comprehensive overview of the development of large language models, reviewing classic papers from GPT‑1 through GPT‑4, discussing open‑source implementations such as LLaMA, Alpaca, GLM, and ChatGLM, and analyzing training methods, datasets, and future research directions.
Introduction
The term "ChatGPT" has dominated 2023, influencing academia, industry, and everyday life. This series aims to discuss ChatGPT‑related technologies in three parts: classic paper reviews, open‑source implementations, and the past, present, and future of natural language generation.
Classic Paper Review (OpenAI Series)
The OpenAI series includes GPT‑1, GPT‑2, GPT‑3, and GPT‑4 (treated as a GPT‑3.5+ variant). The article outlines their motivations, model architectures, training methods, datasets, and key results.
GPT‑1
Paper: Improving Language Understanding by Generative Pre‑Training . Motivation: use large‑scale unlabeled data for pre‑training and fine‑tune on downstream tasks. Model: Transformer decoder, autoregressive language modeling.
Pre‑training loss: maximize likelihood of next token.
Data: BooksCorpus (~7k books) and 1B Word Benchmark.
Fine‑tuning adds special start/end tokens and a classification head.
GPT‑2
Paper: Language Models are Unsupervised Multitask Learners . Motivation: improve zero‑shot performance by scaling model size and data. Model: same decoder architecture as GPT‑1, larger.
Training data: filtered Common Crawl (WebText) – 800 GB of high‑quality web text.
Demonstrated strong zero‑shot results on many NLP benchmarks.
GPT‑3
Paper: Language Models are Few‑Shot Learners . Motivation: reduce reliance on fine‑tuning by leveraging in‑context learning. Model: 175 B parameters, same architecture as GPT‑2 but with optimizations and massive data (Common Crawl + high‑quality corpora).
Shows few‑shot, one‑shot, and zero‑shot capabilities across tasks.
Highlights scaling trends and the "alignment tax" (performance vs. compute).
Codex
Paper: Evaluating Large Language Models Trained on Code . Focuses on code generation by fine‑tuning GPT‑3 on GitHub Python data. Achieves up to 70% pass@100 on HumanEval benchmark.
InstructGPT
Paper: Training language models to follow instructions with human feedback . Introduces RLHF (Reinforcement Learning from Human Feedback) with three stages: supervised fine‑tuning, reward model training, and PPO fine‑tuning.
- K is the number of model answers per prompt; annotators rank them (K=4‑9).
- A bias term normalizes rewards to zero mean for stable RL.ChatGPT
Based on InstructGPT with additional dialogue data. Uses multi‑turn prompts and RLHF to improve helpfulness and harmlessness.
Anthropic Claude
Similar to InstructGPT, employs RLHF with weekly model updates and focuses on helpfulness vs. harmlessness trade‑offs.
Open‑Source Alternatives
LLaMA
Paper: LLaMA: Open and Efficient Foundation Language Models . Provides models from 7 B to 65 B parameters trained on trillions of tokens using only open data. Demonstrates strong zero‑shot and few‑shot performance, often surpassing GPT‑3 at comparable sizes.
Alpaca
Fine‑tunes LLaMA‑7 B on 52 K instruction‑response pairs generated by GPT‑3.5, achieving performance close to GPT‑3.5 with modest compute (3 h on 8×A100).
GLM & ChatGLM
GLM‑130B is a bilingual (Chinese‑English) autoregressive model with novel blank‑infilling pre‑training. ChatGLM‑6B builds on GLM‑130B with instruction fine‑tuning and RLHF, offering a 6 B parameter bilingual chat model that runs on a single 2080Ti.
Conclusion
The rapid evolution of large language models has led to both massive proprietary systems (GPT‑4, ChatGPT) and increasingly capable open‑source alternatives (LLaMA, Alpaca, GLM, ChatGLM). Future work will focus on scaling efficiency, safety, bias mitigation, and broader multilingual capabilities.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.