Artificial Intelligence 6 min read

How Embeddings and Vector Databases Empower LLMs for Semantic Search

This article explains what embeddings and vector databases are, why they matter for large language models, and how they can be used to overcome token limits by storing and retrieving relevant text chunks for accurate, context‑aware responses.

dbaplus Community

Jun 10, 2023

How Embeddings and Vector Databases Empower LLMs for Semantic Search

Embeddings are multi‑dimensional vector arrays that represent any type of data—most commonly text—by converting it into a series of numbers using models such as OpenAI's Ada. These vectors enable semantic search, allowing similarity‑based retrieval based on meaning rather than exact keywords.

Creating and Using Embeddings

The process is straightforward: send your text to an embedding model, receive a vector, and store it for later use. Because large language models (LLMs) have strict token limits (e.g., 4,096 – 32k tokens for OpenAI GPT), embeddings let you inject only the most relevant pieces of information into the model’s context window.

Semantic Similarity Example

Imagine a child with a box of toys who wants to find similar items, like a toy car and a toy bus. This illustrates semantic similarity: items that share meaning or function are close together in the vector space.

Handling Large Documents with Embeddings

When dealing with a massive PDF (e.g., a congressional hearing transcript), you split the text into chunks, embed each chunk, and store the vectors in a database. You also keep a mapping from each vector to its original text chunk, for example:

{
  [1,2,3,34]: "Text chunk 1",
  [2,3,4,56]: "Text chunk 2",
  [4,5,8,23]: "Text chunk 3",
  ...
}

To answer a query like "What did they say about xyz?", you embed the question, compare its vector to the stored vectors using cosine similarity (the recommended metric for OpenAI embeddings), and retrieve the most similar chunks.

Prompting the LLM with Retrieved Context

With the top three relevant chunks in hand, you construct a prompt that includes the context and asks the LLM to answer truthfully. If the model cannot answer, it should respond with "I cannot answer this question." This approach enables chat‑like interactions over arbitrary data sources such as websites, PDFs, or code repositories.

Note that using embeddings is not the same as fine‑tuning a model; it merely augments the prompt with relevant information.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM vector database semantic search embeddings token limit

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.