Pick the Best Vector Similarity Strategy in LangChain: Euclidean, Inner Product, Cosine
This guide explains how to configure and import the appropriate DistanceStrategy in LangChain, compares Euclidean distance, maximum inner product, and cosine similarity, and outlines their formulas, advantages, and typical use‑cases for vector‑based retrieval.
In Retrieval‑augmented Generation (RAG) projects, selecting the right vector similarity metric is crucial; LangChain lets you specify the method via the DistanceStrategy enum.
Configuring DistanceStrategy in LangChain
When creating a PGVector store, you can set the similarity calculation with the distance_strategy argument, as shown below:
self.pg_vector = PGVector(
embeddings=get_embedding_model(),
collection_name=str(self.knowledge_name),
distance_strategy=DistanceStrategy.MAX_INNER_PRODUCT,
connection=PostgresqlVectorStorageKnowledgeService.engine,
use_jsonb=True,
)Be sure to import the correct DistanceStrategy class; otherwise the chosen method will not work.
Import Paths and Enum Definitions
For PGVector, import with:
from langchain_community.vectorstores.pgvector import DistanceStrategyThe enum provides three options:
class DistanceStrategy(str, enum.Enum):
"""Enumerator of the Distance strategies."""
EUCLIDEAN = "l2"
COSINE = "cosine"
MAX_INNER_PRODUCT = "inner"For Faiss, import with:
from langchain.vectorstores.utils import DistanceStrategyIts enum includes additional strategies:
class DistanceStrategy(str, Enum):
"""Enumerator of the Distance strategies for calculating distances between vectors."""
EUCLIDEAN_DISTANCE = "EUCLIDEAN_DISTANCE"
MAX_INNER_PRODUCT = "MAX_INNER_PRODUCT"
DOT_PRODUCT = "DOT_PRODUCT"
JACCARD = "JACCARD"
COSINE = "COSINE"01 Euclidean Distance
Euclidean distance measures the straight‑line distance between two vectors by summing the squared differences of each dimension and taking the square root. Smaller values indicate higher similarity.
It is less suitable for high‑dimensional data due to the "curse of dimensionality". Typical scenarios include:
Machine learning algorithms such as K‑Nearest Neighbors and K‑means clustering.
Image processing for comparing feature vectors.
Recommendation systems when data is low‑dimensional and dense.
Any case requiring an absolute distance measure, e.g., anomaly detection.
02 Maximum Inner Product
Maximum inner product (or dot product) evaluates similarity by the magnitude of the vectors' inner product, reflecting how well their directions align. Larger values mean higher similarity, making it useful for large‑scale high‑dimensional data.
Common applications include:
Semantic search in NLP, comparing word or sentence embeddings.
Attention mechanisms in neural networks, where queries and keys are matched via inner product.
Recommendation systems measuring match strength between user preferences and item features.
03 Cosine Similarity
Cosine similarity normalizes the inner product by the vectors' lengths, focusing solely on direction. Its value ranges from –1 to 1; values near 1 indicate high similarity, –1 indicates opposite direction, and 0 denotes orthogonality.
1 : vectors point in the same direction
-1 : vectors point in opposite directions
0 : vectors are orthogonal (no similarity)Because it ignores vector magnitude, cosine similarity works well with high‑dimensional sparse data such as TF‑IDF, bag‑of‑words, or BERT embeddings. Typical use cases are:
Text similarity calculations for documents or sentences.
Recommendation engines matching user interest vectors to item feature vectors.
Image similarity assessment in feature‑vector based image retrieval.
Ma Wei Says
Follow me! Discussing software architecture and development, AIGC and AI Agents... Sometimes sharing insights on IT professionals' life experiences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
