Artificial Intelligence 8 min read

Mastering LLM Tokens: How They Work, Cost, and Choose the Right Model

This article explains what tokens are in large language models, how they are counted and priced, compares tokenization methods across major models, and provides practical guidelines and code examples for optimizing token usage and selecting the appropriate model for different scenarios.

Qborfy AI

Aug 16, 2025

Mastering LLM Tokens: How They Work, Cost, and Choose the Right Model

Tokens are the smallest textual units processed by large language models (LLMs). A token can be a word, sub‑word, character, or punctuation mark and is represented by a unique integer ID that is embedded into a vector for the neural network.

Variable length and numeric representation

Variable length : 1 token ≠ 1 character. Example: the Chinese phrase "人工智能" may be tokenized as ["人工", "智能"] (2 tokens) or as ["人", "工", "智", "能"] (4 tokens).

Numeric ID : each token maps to a unique ID (e.g., "AI" → 31924) before being converted to an embedding.

Billing basis : API usage is charged per input and output token (e.g., ¥1 per million tokens).

Token density and context windows

Chinese vs. English density : 1 Chinese character ≈ 0.6 token, 1 English character ≈ 0.3 token because high‑frequency words are merged.

Context window : each model has a maximum token limit per request. GPT‑4 Turbo supports 128 K tokens ≈ 65 000 Chinese characters.

Why token awareness matters

Cost is directly proportional to token count; all paid LLM APIs charge by token.

Different models impose different token limits (e.g., GPT‑4 Turbo caps at 128 K tokens).

Tokenization rules vary by model, and providers expose APIs to compute token counts.

Token‑cost calculation example

Scenario: user asks "订单号DD20240815何时发货？"

Token split (DeepSeek tokenizer) :

["订单号", "DD", "2024", "08", "15", "何时", "发货"] → 7 tokens

Model reply : "订单已发货，物流单号SF123456" → 6 tokens.

Total cost :

Input 7 tokens + output 6 tokens = 13 tokens.

DeepSeek‑V3 price: ¥0.1 per million tokens (¥0.0000001 per token).

Cost = 13 × 0.0000001 = ¥0.0000013.

Tokenization strategies of major models

ChatGPT uses BPE, splitting long words accurately (e.g., "人工智能" → 2 tokens).

DeepSeek uses WordPiece, capturing morphemes well (e.g., "学习能力" → "学习" + "能力").

Alibaba QWen uses SentencePiece, handling rare words effectively (e.g., "氪金" remains a single token).

Industry insight: a customer‑service system consumes tens of millions of tokens per month; optimizing tokenization can cut costs by about 20%.

Hands‑on experiments

Online tokenizer demo (no hyperlink): https://platform.openai.com/tokenizer. Example: "区块链" → [24775, 28638, 245, 64414].

Code example (Transformers)

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "大模型Token是什么？"
tokens = tokenizer.tokenize(text)  # Output: ['大', '模型', 'Token', '是', '什么', '？']
print(f"Token数量：{len(tokens)}")

Model selection based on token limits

≤64 K tokens: choose Qwen2‑7B (open‑source) or GPT‑4 Turbo (multimodal).

64 K–200 K tokens: use Claude 3.7 for strong long‑text understanding.

≥200 K tokens: opt for Gemini 1.5 Pro (requires higher budget).

Cold knowledge

Training data scale: GPT‑3 consumed 300 billion tokens, roughly equivalent to 3 million years of human reading.

128 K context can process an entire novel like "The Three‑Body Problem" (~65 k Chinese characters) in one request.

Chinese incurs a 40%–100% token tax compared to English for the same content.

Emoji splitting: "❤️" becomes "♥" + "️" (2 tokens), which may mislead sentiment analysis.

API token pricing comparison (selected providers)

ByteDance Doubao (128 K): input ¥0.8 /M, output ¥0.8 /M – best for ultra‑long Chinese tasks.

DeepSeek‑V3 (64 K): input ¥1–4 /M (off‑peak discounts), output ¥1–4 /M – good for high‑precision Chinese tasks.

Alibaba Tongyi Qianwen (128 K): input ¥2.4 /M, output ¥6 /M – suited for everyday Q&A and translation.

OpenAI GPT‑4 (128 K): input ¥70–210 /M, output ¥210–420 /M – ideal for English creative writing and code generation.

Model Arts (128 K): input ¥1 /M, output ¥4 /M – first year includes 1 M free tokens.

Note: 1 M tokens ≈ 600 k Chinese characters; a typical intelligent‑customer‑service interaction costs about 100–500 tokens.

References

[1] Qborfy – https://qborfy.com

[2] OpenAI Tokenizer – https://platform.openai.com/tokenizer (also https://huggingface.co/spaces/Xenova/the-tokenizer-playground)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM Prompt engineering cost optimization tokenization model selection

Written by

Qborfy AI

A knowledge base that logs daily experiences and learning journeys, sharing them with you to grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.