Artificial Intelligence 36 min read

Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives

This article provides a comprehensive overview of the development of large language models, reviewing classic papers from GPT‑1 through GPT‑4, discussing open‑source implementations such as LLaMA, Alpaca, GLM, and ChatGLM, and analyzing training methods, datasets, and future research directions.

Architect

Apr 27, 2023

Survey of Large Language Model Research: From GPT‑1 to ChatGPT and Open‑Source Alternatives

Introduction

The term "ChatGPT" has dominated 2023, influencing academia, industry, and everyday life. This series aims to discuss ChatGPT‑related technologies in three parts: classic paper reviews, open‑source implementations, and the past, present, and future of natural language generation.

Classic Paper Review (OpenAI Series)

The OpenAI series includes GPT‑1, GPT‑2, GPT‑3, and GPT‑4 (treated as a GPT‑3.5+ variant). The article outlines their motivations, model architectures, training methods, datasets, and key results.

GPT‑1

Paper: Improving Language Understanding by Generative Pre‑Training . Motivation: use large‑scale unlabeled data for pre‑training and fine‑tune on downstream tasks. Model: Transformer decoder, autoregressive language modeling.

Pre‑training loss: maximize likelihood of next token.

Data: BooksCorpus (~7k books) and 1B Word Benchmark.

Fine‑tuning adds special start/end tokens and a classification head.

GPT‑2

Paper: Language Models are Unsupervised Multitask Learners . Motivation: improve zero‑shot performance by scaling model size and data. Model: same decoder architecture as GPT‑1, larger.

Training data: filtered Common Crawl (WebText) – 800 GB of high‑quality web text.

Demonstrated strong zero‑shot results on many NLP benchmarks.

GPT‑3

Paper: Language Models are Few‑Shot Learners . Motivation: reduce reliance on fine‑tuning by leveraging in‑context learning. Model: 175 B parameters, same architecture as GPT‑2 but with optimizations and massive data (Common Crawl + high‑quality corpora).

Shows few‑shot, one‑shot, and zero‑shot capabilities across tasks.

Highlights scaling trends and the "alignment tax" (performance vs. compute).

Codex

Paper: Evaluating Large Language Models Trained on Code . Focuses on code generation by fine‑tuning GPT‑3 on GitHub Python data. Achieves up to 70% pass@100 on HumanEval benchmark.

InstructGPT

Paper: Training language models to follow instructions with human feedback . Introduces RLHF (Reinforcement Learning from Human Feedback) with three stages: supervised fine‑tuning, reward model training, and PPO fine‑tuning.

- K is the number of model answers per prompt; annotators rank them (K=4‑9).
- A bias term normalizes rewards to zero mean for stable RL.

ChatGPT

Based on InstructGPT with additional dialogue data. Uses multi‑turn prompts and RLHF to improve helpfulness and harmlessness.

Anthropic Claude

Similar to InstructGPT, employs RLHF with weekly model updates and focuses on helpfulness vs. harmlessness trade‑offs.

Open‑Source Alternatives

LLaMA

Paper: LLaMA: Open and Efficient Foundation Language Models . Provides models from 7 B to 65 B parameters trained on trillions of tokens using only open data. Demonstrates strong zero‑shot and few‑shot performance, often surpassing GPT‑3 at comparable sizes.

Alpaca

Fine‑tunes LLaMA‑7 B on 52 K instruction‑response pairs generated by GPT‑3.5, achieving performance close to GPT‑3.5 with modest compute (3 h on 8×A100).

GLM & ChatGLM

GLM‑130B is a bilingual (Chinese‑English) autoregressive model with novel blank‑infilling pre‑training. ChatGLM‑6B builds on GLM‑130B with instruction fine‑tuning and RLHF, offering a 6 B parameter bilingual chat model that runs on a single 2080Ti.

Conclusion

The rapid evolution of large language models has led to both massive proprietary systems (GPT‑4, ChatGPT) and increasingly capable open‑source alternatives (LLaMA, Alpaca, GLM, ChatGLM). Future work will focus on scaling efficiency, safety, bias mitigation, and broader multilingual capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models AI research GPT open-source LLM

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.