Build a GPT from Scratch and Decode AI Coding Jargon with Two Top GitHub Projects

The article introduces two practical GitHub repositories—how-to-train-your-gpt, a step‑by‑step guide that builds a LLaMA‑style GPT model across 12 chapters, and dictionary-of-ai-coding, a plain‑language glossary of AI‑coding terms—showing how they together provide a complete understanding of modern LLM fundamentals and terminology.

Geek Labs
Geek Labs
Geek Labs
Build a GPT from Scratch and Decode AI Coding Jargon with Two Top GitHub Projects

Many developers use AI to write code daily, yet few understand the underlying mechanisms of tools like ChatGPT or Claude. This article highlights two GitHub projects that together offer a thorough, hands‑on education in large‑language‑model fundamentals and the jargon surrounding AI‑assisted programming.

Project 1: Build a Modern Language Model from Scratch

how-to-train-your-gpt – a repository containing 12 chapters and 3,671 lines of code that guides you through constructing a LLaMA‑3‑level model.

how-to-train-your-gpt project screenshot
how-to-train-your-gpt project screenshot

The project tackles two common learning obstacles: overly shallow tutorials that only cover API calls, and dense academic papers filled with equations. It adopts a middle path, using child‑friendly analogies to explain concepts before presenting production‑grade code.

The learning path covers:

Chapter 0‑1: Foundations – What is GPT? How to set up the environment? Differences between GPU and CPU.

Chapter 2: Tokenizer – How the word “unbelievably” is split into tokens using the BPE algorithm.

Chapter 3: Embeddings – Why “cat” and “dog” are close in vector space; the arithmetic behind king − man + woman = queen.

Chapter 4: Positional Encoding (RoPE) – Why models like LLaMA and Mistral use rotary encoding instead of simple positional indices.

Chapter 5: Attention (Core) – A 713‑line walkthrough of Q, K, V, scaling, causal masks, and an eight‑step numeric example with real numbers.

Chapter 6: Transformer Block – Explanations of RMSNorm, SwiGLU, residual connections, and the pre‑norm vs. post‑norm debate.

Chapters 7‑9: Full Model, Training, Inference – Implementation of a 124 M‑parameter model, AdamW with cosine warm‑up, mixed‑precision training, KV cache, temperature, top‑k/top‑p sampling, and beam search.

The repository uses a modern LLM architecture (RoPE, RMSNorm, SwiGLU, Pre‑Norm) rather than the older GPT‑2 design.

After completing the tutorial you will be able to manually compute attention scores, read contemporary ML papers, debug training pipelines, and understand why KV cache can accelerate inference by up to 500×.

git clone https://github.com/raiyanyahya/how-to-train-your-gpt.git
cd how-to-train-your-gpt
pip install torch tiktoken datasets numpy matplotlib
open chapters/00_overview.md

Project 2: Translate AI‑Coding Terminology into Plain Language

dictionary-of-ai-coding – a TypeScript‑based glossary that explains AI‑coding terms in everyday language; it already serves over 62,000 developers.

dictionary-of-ai-coding project screenshot
dictionary-of-ai-coding project screenshot

The author states that AI programming is not inherently difficult; much of the confusion is deliberately created by a VC‑funded ecosystem that profits from keeping people bewildered.

The glossary organizes core terms into seven categories:

Model‑Related – A model is just a set of parameters; it needs a harness (toolchain) to become an AI assistant. Tokens are the basic units; common words map to a single token, while rare words are split.

Context and Conversation – The context window defines how many tokens the model can “see”; exceeding it causes forgetting. Stateless models treat each request independently, relying on a session to maintain continuity.

Tools and Environment – Tools are external capabilities the model can invoke; a “tool call” is simply the model outputting a structured string. MCP (Model Context Protocol) standardizes such interactions.

Failure Modes – Hallucination is the model confidently generating false information; attention degradation occurs when long contexts cause the model to remember the beginning and end best, forgetting the middle; sycophancy is the tendency to agree with the user even when wrong.

Handoff and Collaboration – Handoff describes passing tasks between AI agents, requiring context transfer. The file AGENTS.md in the repo provides background for AI assistants.

Working Modes – “Vibe coding” means coding by intuition with AI assistance; “human‑in‑the‑loop” emphasizes that humans intervene at critical points.

Consulting the dictionary can answer practical questions such as why context degrades (see “Attention degradation”), why token usage drives high bills, why prompts yield nondeterministic results, or why the model sometimes fabricates answers.

Using the Two Projects Together

The first project teaches the low‑level principles of LLMs, while the second demystifies the surrounding terminology. Studying them in tandem provides a more complete education than consuming either resource alone.

Project 1: https://github.com/raiyanyahya/how-to-train-your-gpt | ⭐ 468 | Jupyter Notebook<br/> Project 2: https://github.com/mattpocock/dictionary-of-ai-coding | ⭐ 1087 | TypeScript
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMattentionGitHubGPTTokenizer
Geek Labs
Written by

Geek Labs

Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.