Tagged articles
4 articles
Page 1 of 1
AgentGuide
AgentGuide
Apr 12, 2026 · Artificial Intelligence

What Is a Token? A Deep Dive into Tokenization Algorithms for LLMs

The article defines tokens (now officially called “词元”), explains why large language models require numeric input, and details three main tokenization strategies—word‑based, character‑based, and subword—along with the sub‑methods BPE, WordPiece, and Unigram, highlighting their advantages and drawbacks.

BPELLMUnigram
0 likes · 6 min read
What Is a Token? A Deep Dive into Tokenization Algorithms for LLMs
ShiZhen AI
ShiZhen AI
Dec 1, 2025 · Artificial Intelligence

AI Comic Episode 3: What Exactly Is a Token?

This episode explains that a token is the smallest text chunk an LLM processes—ranging from characters to subwords—covers why subword tokenization avoids vocabulary explosion, compares token counts across languages, describes the computational cost of sequential generation, and introduces visual tokens for multimodal models.

AI fundamentalsMultimodallarge language models
0 likes · 7 min read
AI Comic Episode 3: What Exactly Is a Token?
Code Mala Tang
Code Mala Tang
Mar 27, 2025 · Artificial Intelligence

How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?

This article explains the fundamentals, workflows, examples, and trade‑offs of three major subword tokenization algorithms—Byte Pair Encoding, WordPiece, and SentencePiece—helping practitioners choose the right method for their large language model pipelines.

BPENLPSentencePiece
0 likes · 12 min read
How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?
Infra Learning Club
Infra Learning Club
Oct 31, 2024 · Artificial Intelligence

What Is a Token in Large Language Models?

The article explains that a token is the unit processed by large language models, describes three common tokenizer methods—word‑level, character‑level, and sub‑word level—with English and Chinese examples, discusses their advantages and limitations, and shows how OpenAI’s tokenizer varies across model versions.

NLPTokencharacter-level
0 likes · 5 min read
What Is a Token in Large Language Models?