What Is Embedding in RAG and Why Does It Use 1536 Dimensions?
The article explains that embedding converts text into a 1536‑dimensional floating‑point vector that serves as a semantic fingerprint, describes how the vector is generated, why 1536 dimensions are chosen, how similarity is measured, and provides Java Spring AI code examples along with model‑selection guidance and common interview pitfalls.
Interview Focus
Concept depth : Explain in your own words how text becomes numbers and why vector distance reflects semantic similarity.
Intuition about dimensions : What does 1536‑dimensional mean and how does dimension relate to semantic capacity?
Engineering practice : How to choose an embedding model, and the impact of dimension on storage and computation.
Core Answer
Embedding (vector embedding) is the process of converting a piece of text into a floating‑point array (vector) that acts as the text’s “semantic fingerprint”.
For example, the sentence “今天天气真好” might be encoded as:
[0.012, -0.034, 0.567, -0.189, 0.423, ..., 0.078] ← 1536 floatsThe 1536 dimensions simply mean the array contains 1536 elements; each element is a float. All 1536 numbers together encode the semantic information of the input text.
Core logic : Textes that are semantically close produce vectors that are close in high‑dimensional space, so cosine similarity can be used to judge semantic similarity – the foundation of vector retrieval in RAG.
Deep Analysis
1. From Text to Vector – What Actually Happens?
1. Input : A piece of text (e.g., “怎么退货”) is fed to an embedding model.
2. Model internals : The text is tokenized and passed through a multi‑layer Transformer encoder that extracts semantic features. The model is pre‑trained; developers only need to call the API.
3. Output : A fixed‑length float array. Using OpenAI’s text-embedding-3-small model, the output length is 1536.
In practice you call the model via Spring AI or LangChain4j with a single line of code; the heavy lifting is hidden.
2. Why 1536 Dimensions?
The number 1536 has no special mathematical meaning; it is a design choice made by OpenAI for the text-embedding-ada-002 model and carried over to text-embedding-3-small. The trade‑off is:
Higher dimensions → richer semantic representation.
Higher dimensions → higher storage (4 bytes per float, ~6 KB per vector) and higher compute cost.
A simplified comparison:
128‑256 dim: low cost, basic classification.
768 dim (e.g., BERT): moderate cost, decent semantics.
1024 dim (e.g., BGE‑M3): good semantics, moderate cost.
1536 dim : good semantics, higher cost – a balanced choice for mainstream RAG.
3072 dim: very strong semantics, high cost.
8192+ dim: research‑grade, very high cost.
In short, more dimensions encode more semantic features but increase storage and computation.
3. How to Compute Semantic Similarity?
After obtaining vectors, similarity is usually measured with cosine similarity because it focuses on direction and is insensitive to vector length. Example:
Text A “怎么退货” vs. Text B “退换货政策” → cosine similarity ≈ 0.96 (very close).
Text A vs. Text C “北京天气预报” → cosine similarity ≈ 0.12 (far apart).
Other metrics such as Euclidean distance (L2) or inner product exist, but cosine similarity is preferred for text retrieval.
4. Hands‑On Demo with Spring AI
Below is a minimal Spring AI demo that embeds three sentences, prints the vector length and first five values, and computes cosine similarity.
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.embedding.EmbeddingResponse;
import org.springframework.stereotype.Service;
@Service
public class EmbeddingDemoService {
private final EmbeddingModel embeddingModel;
public EmbeddingDemoService(EmbeddingModel embeddingModel) { this.embeddingModel = embeddingModel; }
/** Demonstrate embedding process */
public void demo() {
String text1 = "怎么退货";
String text2 = "退换货政策";
String text3 = "北京天气预报";
float[] vector1 = embeddingModel.embed(text1);
float[] vector2 = embeddingModel.embed(text2);
float[] vector3 = embeddingModel.embed(text3);
System.out.println("向量维度:" + vector1.length); // 1536
System.out.println("前5个值:" + java.util.Arrays.toString(java.util.Arrays.copyOf(vector1, 5)));
double sim12 = cosineSimilarity(vector1, vector2); // ≈ 0.96
double sim13 = cosineSimilarity(vector1, vector3); // ≈ 0.12
System.out.println("'怎么退货' vs '退换货政策' 相似度:" + sim12);
System.out.println("'怎么退货' vs '北京天气预报' 相似度:" + sim13);
}
private double cosineSimilarity(float[] a, float[] b) {
double dot = 0.0, normA = 0.0, normB = 0.0;
for (int i = 0; i < a.length; i++) {
dot += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}
}Running the demo prints a vector length of 1536, a sample of the first five values, and similarity scores matching the earlier examples.
5. Common Pitfalls
“Higher dimension is always better” : Not true. Higher dimensions increase storage (≈6 KB per vector) and compute cost; choose a dimension that balances accuracy and resources.
“Embedding only works for text” : Embedding is a generic concept; images, audio, and video can also be embedded (e.g., CLIP for multimodal retrieval).
“All embedding models output 1536 dimensions” : Different models have different output sizes (e.g., BERT 768, BGE‑M3 1024, text‑embedding‑3‑large 3072). In Spring AI you can control the dimension via the dimensions parameter.
6. Choosing an Embedding Model
Typical options and their characteristics: text-embedding-3-small (OpenAI) – 1536 dim, high cost‑performance, best for English RAG and quick prototypes. text-embedding-3-large (OpenAI) – 3072 dim, strongest semantic expression, suitable when precision matters. bge-large-zh (BAAI) – 1024 dim, Chinese‑optimized, free and open‑source, ideal for Chinese RAG. bge-m3 (BAAI) – 1024 dim, multilingual, multi‑granular, good for mixed‑language scenarios. text-embedding-v3 (Tongyi Qianwen) – 1024 dim, Chinese‑excellent, available via Alibaba Cloud API.
Selection advice : Use BGE series for Chinese projects to save cost; if you already use OpenAI API, text-embedding-3-small is the most convenient; for Alibaba Cloud users, the Tongyi model integrates smoothly with Spring AI Alibaba.
7. High‑Frequency Follow‑Up Questions
Cosine similarity vs. Euclidean distance : Cosine focuses on direction and ignores vector length, making it robust to text length; Euclidean measures absolute distance and is affected by length. Text retrieval usually prefers cosine.
Can embedding models be fine‑tuned? : Yes, but only when the generic model performs poorly on a specific domain (e.g., heavy jargon). Fine‑tuning requires large sets of similar and dissimilar text pairs.
Can dimensions be reduced? : OpenAI’s newer models support a dimensions parameter to truncate vectors (e.g., 3072 → 512) reducing storage by ~6× with only a few percent loss in accuracy.
Summary
Embedding converts text into a fixed‑length float vector that serves as a semantic fingerprint; 1536 dimensions indicate the vector contains 1536 floats. Vectors that are close in this high‑dimensional space correspond to semantically similar texts, enabling vector retrieval in RAG. Higher dimensions improve semantic granularity but increase storage and compute costs, making 1536‑dimensional vectors a balanced choice for most production RAG systems. Understanding the encoding pipeline, similarity metrics, model‑dimension trade‑offs, and common pitfalls equips candidates to answer interview questions confidently and to apply embeddings effectively in real projects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Handbook
Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
