Artificial Intelligence 10 min read

Demystifying LLMs: From Tokens to Agents – An Engineer’s Deep Dive

This article provides a comprehensive, engineering‑focused breakdown of large language models, covering their Transformer roots, tokenization, context windows, prompt engineering, tool integration via MCP, and autonomous agents, while offering practical examples and actionable insights for developers.

Code Mala Tang

Apr 7, 2026

Demystifying LLMs: From Tokens to Agents – An Engineer’s Deep Dive

1. The Bottom Layer: LLMs (Large Language Models)

LLMs, or large language models, are the core engine of modern AI, typically built on the Transformer architecture introduced by Google in 2017 and popularized by OpenAI.

2022‑end: GPT‑3.5, the first truly usable large model.

Mar 2023: GPT‑4, raising the AI capability ceiling.

Today: GPT family remains a benchmark, while Claude, Gemini and others compete.

The essence of a large model is a "word‑by‑word" generation process, effectively a text‑completion game.

2. Tokens and Tokenizers – The Model’s Translation Layer

Models operate on numbers, not characters. The tokenizer converts text to numeric token IDs and back.

Split text into the smallest units (tokens).

Map each token to a numeric ID.

Key insight: a token is not a word. Examples:

Chinese "程序员" → tokens "程序" + "员".

English "helpful" → tokens "help" + "ful".

Some special characters may require three tokens.

Practical estimates:

1 token ≈ 0.75 English words.

1 token ≈ 1.5–2 Chinese characters.

400 k tokens ≈ 600–800 k Chinese characters (a thick book).

3. Temporary Memory: Context and Context Window

Context is the total information the model receives for a task, including the current user query, dialogue history, tokens being generated, tool list, system prompts, etc.

Context Window defines the maximum number of tokens the model can hold, e.g.:

GPT‑4.5: 1.05 M tokens.

Claude 3.1 Pro: 1 M tokens.

Cloudopus 4.6: 1 M tokens.

~1 M tokens ≈ 1.5 M Chinese characters, enough for the entire Harry Potter series.

To handle ultra‑long documents, Retrieval‑Augmented Generation (RAG) extracts the most relevant passages and feeds only those to the model, bypassing the window limit and reducing cost.

4. The Art of Prompt Engineering

A prompt is the specific instruction or question given to the model. There are two main types:

User Prompt : the end‑user’s query, e.g., “Write a poem for me.”

System Prompt : developer‑defined persona or rules, invisible to the user.

Examples:

Vague prompt: “Write a poem” → unpredictable style.

Precise prompt: “Write a five‑line poem about autumn leaves in a bright tone” → accurate output.

System prompt example: “You are a patient math teacher who guides students to think instead of giving direct answers.”

Industry reality: Prompt engineering hype is fading because the skill barrier is low (just clear wording) and models have become strong enough to infer intent from vague prompts.

5. Perceiving the World: Tools and MCP

Large models cannot access real‑time external information. The solution is to integrate Tools—functions that take input, perform an operation, and return results.

Typical workflow:

User question is sent to the platform.

Platform forwards the question plus available tool list to the model.

Model generates a tool‑call command.

Platform invokes the corresponding tool.

Tool returns the result to the platform.

Platform passes the result back to the model.

Model composes a natural‑language answer for the user.

Roles:

Model: selects tools and aggregates results.

Tool: executes concrete operations.

Platform: orchestrates the entire pipeline.

Problem: each platform defines its own tool interface (OpenAI, Anthropic, Google, etc.).

Ultimate solution: Model Context Protocol (MCP) – a unified standard for tool integration, similar to a Type‑C connector, allowing a tool written once to work across all MCP‑compatible platforms.

6. Autonomous Agents and Agent Skills

An Agent can autonomously plan, invoke tools, and keep working until a task is completed.

Complex task example: “What’s the weather today and are there nearby umbrella stores?” The agent will:

Call a location service to get coordinates.

Call a weather service.

Based on the weather, call a store‑search service.

Combine all information into a final answer.

Pain point: repetitive rule definitions for each task.

Solution: Agent Skill – a Markdown‑formatted instruction document that pre‑defines metadata, goals, steps, judgment rules, output format, and examples for the agent.

Structure of an Agent Skill:

Metadata layer: name and description.

Instruction layer: objectives, execution steps, validation rules, output format, examples.

Creating an Agent Skill (practical steps):

Create a folder named after the skill inside the .cloudskills directory.

Inside the folder, add a capital‑letter file SKILL.md.

Write the complete instruction content into SKILL.md.

When the agent matches a request, it automatically loads and executes the corresponding skill.

7. Complete Knowledge Map

LLM = core engine.

Token = basic data unit.

Context = temporary memory (unit: token).

Context Window = memory capacity limit.

Prompt = specific instruction (User / System).

Tool = function that lets the model perceive the external world.

MCP = unified tool‑integration standard.

Agent = system that plans, calls tools, and completes tasks autonomously.

Agent Skill = the “manual” for an agent.

Understanding these fundamentals empowers developers to navigate current and future AI products such as Cloud Code, Codex, Gemini CLI, which all operate within this framework.

Core principles remain unchanged despite rapid technological iteration.

LLM prompt engineering Tool Integration Agent tokenization AI fundamentals