Artificial Intelligence 8 min read

Clarifying the Key Components of AI Large‑Model Development: Vectors, Vector Models, and RAG

This article explains how vectors encode text or images, how vector (embedding) models generate these numeric representations, why specialized vector databases are needed for similarity search, and how Retrieval‑Augmented Generation (RAG) combines them to produce reliable answers while stressing the necessity of using the same model throughout the pipeline.

Linyb Geek Road

Apr 17, 2026

Clarifying the Key Components of AI Large‑Model Development: Vectors, Vector Models, and RAG

1. Vectors and Vector Models

Vectors are fixed‑length numeric arrays that represent the semantic meaning of a piece of text or an image. For example, "apple is a fruit" might be encoded as [0.1, 0.3, 0.8, …], while "banana is a fruit" yields a nearby vector and "car is a vehicle" produces a distant one. The rule is simple: the more similar the meanings, the smaller the distance between vectors.

A vector model (also called an embedding model) is an AI model that converts raw text or images into such vectors. It performs only semantic encoding, not dialogue. Common open‑source Chinese models include bge-base-zh , m3e , and text2vec ; a widely used commercial model is OpenAI text‑embedding‑ada‑002 . Their purpose is to turn uncomputable text into computable, searchable vectors.

2. Vector Databases

Traditional databases like MySQL or Elasticsearch excel at exact keyword matching and range queries but cannot efficiently find the most semantically similar content or perform high‑dimensional (e.g., 1024‑dim) similarity search. Vector databases store vectors together with original text or metadata, build indexes such as HNSW or IVF, and can retrieve the top‑K most similar items in milliseconds, supporting filtering, batch operations, and high concurrency.

3. Retrieval‑Augmented Generation (RAG)

Large language models suffer from outdated knowledge, hallucinations, and lack of access to private documents. RAG solves this by first retrieving relevant external information and then feeding that information to the model as a reference for answer generation. The offline (indexing) stage consists of chunking documents, embedding each chunk with a vector model, and storing the vectors in a vector database. The online (query) stage follows these steps:

User asks a question.

The question is embedded by the same vector model to obtain a query vector.

The vector database returns the most relevant chunks.

The question and retrieved chunks are combined into a prompt.

The large model generates the final answer.

4. Relationship Among the Three Components

Vector models produce embeddings (the raw material), vector databases store and retrieve those embeddings (the warehouse and search staff), and RAG orchestrates the end‑to‑end QA workflow (the scheduler and answer outlet). Missing any component breaks the pipeline: without a vector model there is nothing to store, without a vector DB retrieval is slow or impossible, and without RAG the stored data cannot be used for answering.

5. Critical Rule: Consistent Vector Model Usage

The same vector model must be used for both indexing and retrieval. Different models encode text in different languages or vector spaces, leading to "chicken‑talking‑duck" mismatches and completely irrelevant results. Consistency must be maintained in model name, version, dimensionality, and normalization method. Examples:

Index with bge‑m3 → Retrieve with bge‑m3 ✅

Index with m3e‑base → Retrieve with m3e‑base ✅

Index with bge‑base → Retrieve with bge‑small ❌ (not recommended)

6. Dimension Consistency

Vectors must be retrieved with the same dimensionality they were stored with: a 768‑dimensional index can only be queried with 768‑dimensional vectors, and likewise for 1024 dimensions. Mismatched dimensions cause errors.

7. Popular Open‑Source Vector Databases (2026)

Milvus – Zilliz’s open‑source, enterprise‑grade solution; full‑featured, high performance, multiple index types; preferred for production.

Qdrant – Written in Rust; fast, low‑memory, clean API, cloud‑native friendly; ideal for small‑to‑medium projects.

Chroma – Lightweight, Python‑friendly; easiest to get started, great for development and debugging; default in the LangChain ecosystem.

Weaviate – Rich feature set with GraphQL support; strong vector + structured filtering; suited for complex business systems.

8. Common Pitfalls (Avoidance Guide)

RAG is not fine‑tuning; fine‑tuning changes model weights, while RAG only adds external data.

Higher vector dimensions are not always better; they increase latency. For Chinese text, 768 or 1024 dimensions are usually sufficient.

Vector databases complement, not replace, relational databases; use MySQL for business data and a vector DB for similarity search.

Switching to a new vector model requires rebuilding the entire vector store because old vectors become invalid.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI RAG vector database open source Large Language Model Vector Embedding

Written by

Linyb Geek Road

Tech notes

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.