How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs
This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.
Embedding Basics
In earlier examples a simple neural network used numeric codes such as a = 1 and b = 2 to represent characters. While easy, these numbers carry no semantic meaning and cannot capture relationships between characters.
Embeddings solve this problem by mapping characters, words, or symbols to a fixed‑length numeric vector that is learned during model training.
From Numbers to Vectors
A vector is an ordered list of numbers, e.g., a length‑10 vector [0.1, 0.2, 0.3, …, 0.10]. Each position in the vector is fixed; swapping positions creates a different vector, just as swapping color channels changes a color representation.
Training Embeddings
The training process mirrors weight training in neural networks: embeddings are initialized randomly and then refined through gradient descent to minimize a loss function.
Initialize embeddings: assign a random vector to each token.
Feed vectors into the network.
Compute loss by comparing the network output with the expected result.
Update vectors via gradient descent to reduce loss.
Iterate many times until vectors capture useful semantic features.
As training progresses, vectors become “intelligent,” encoding semantic characteristics that can be reused across different models. For example, once the vector for the character “a” is learned, the same vector is used whenever “a” appears.
Embedding Matrix
To manage many vectors efficiently, they are stored in an embedding matrix—a two‑dimensional array where each column corresponds to a token’s vector. If we have 26 letters and each vector has length 10, the matrix size is 10 × 26.
When a token needs to be represented, we simply look up its column in the matrix. This matrix can also store word‑level or sub‑word embeddings, making it a core component of modern language models.
Properties of Embeddings
All vectors must have the same dimensionality, otherwise they cannot be concatenated into a fixed‑size input layer.
Embedding vectors capture similarity: vectors for semantically related words (e.g., “king” and “queen”) are close in the vector space.
By converting discrete symbols into continuous vectors, embeddings enable neural networks to understand and generate natural language more effectively.
AI Large Model Application Practice
Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
