Artificial Intelligence 8 min read

Explore the LLM Architecture Gallery: Visualizing Seven Years of Model Evolution

The LLM Architecture Gallery, created by Sebastian Raschka, offers an interactive visual compendium of open‑weight large language models from 2019 to 2026, detailing their core parameters, architectural innovations, and the broader trends shaping modern AI research.

SuanNi

Mar 18, 2026

Explore the LLM Architecture Gallery: Visualizing Seven Years of Model Evolution

Strolling Through the Architecture Gallery

The LLM Architecture Gallery, launched by AI researcher Sebastian Raschka, quickly sparked discussion on Hacker News and earned praise from figures like Andrej Karpathy. All metadata is openly hosted on GitHub, allowing developers to contribute feedback.

The web interface visualizes model architectures from the 2019 baseline models up to the latest open‑weight releases in spring 2026. Clicking a model name opens a high‑resolution architecture panel that highlights embeddings, positional encodings, normalization methods, feed‑forward networks, attention‑head counts, hidden‑layer dimensions, and context lengths.

Each model also includes a compact data table summarizing scale, release date, decoder type, and attention mechanism. Inline explanations cover advanced concepts such as Grouped‑Query Attention (GQA), Multi‑Head Latent Attention (MLA), Sliding‑Window Attention (SWA), and Gated DeltaNet.

Understanding the Evolution of LLMs

Over the past seven years, top open‑weight models retain macro‑architectural motifs from early designs, primarily expanding attention layers and feed‑forward networks. Improvements stem from larger compute budgets and novel training methods like reinforcement learning.

Micro‑architectural innovations focus on memory efficiency and computational speed. For example, Llama 4 adopts a hybrid‑expert design similar to DeepSeek V3 but preserves Grouped‑Query Attention, whereas DeepSeek V3 embraces Multi‑Head Latent Attention to reduce key‑value cache memory.

Other notable designs include Mistral Large 3 (French AI startup) using a scaled‑up Multi‑Head Latent Attention expert network, Qwen 3 (Alibaba) inserting Gated DeltaNet into traditional attention layers, and Nvidia’s Nemotron 3 Nano blending the Mamba‑2 state‑space model with standard attention for faster inference and coherent text generation.

Who Is Sebastian Raschka?

Sebastian Raschka has over a decade of experience across academia and industry, formerly an assistant professor of statistics at UW‑Madison and now an LLM research engineer at Lightning AI. He emphasizes hands‑on learning, encouraging developers to explore model code line‑by‑line.

His GitHub account, with the repository LLMs‑from‑scratch, has attracted tens of thousands of followers and over 10 000 forks. In 2024 he published the book Build a Large Language Model (From Scratch) , which guides readers through data preparation, architecture design, pre‑training, and fine‑tuning using Python and PyTorch on a standard laptop.

The book demystifies attention mechanisms, Transformer architecture, and tokenization by translating academic concepts into clear code. It is complemented by a 17‑hour video series and a 2026 follow‑up, Build a Reasoning Model (From Scratch) , focusing on adding logical reasoning capabilities.

For deeper exploration, see the following resources:

https://sebastianraschka.com/

https://github.com/rasbt/llm-architecture-gallery

https://sebastianraschka.com/llm-architecture-gallery/

https://news.ycombinator.com/item?id=47388676

https://x.com/rasbt/status/2033167146302210058