Artificial Intelligence 6 min read

Pick the Best Vector Similarity Strategy in LangChain: Euclidean, Inner Product, Cosine

This guide explains how to configure and import the appropriate DistanceStrategy in LangChain, compares Euclidean distance, maximum inner product, and cosine similarity, and outlines their formulas, advantages, and typical use‑cases for vector‑based retrieval.

Ma Wei Says

Feb 22, 2025

Pick the Best Vector Similarity Strategy in LangChain: Euclidean, Inner Product, Cosine

In Retrieval‑augmented Generation (RAG) projects, selecting the right vector similarity metric is crucial; LangChain lets you specify the method via the DistanceStrategy enum.

Configuring DistanceStrategy in LangChain

When creating a PGVector store, you can set the similarity calculation with the distance_strategy argument, as shown below:

self.pg_vector = PGVector(
    embeddings=get_embedding_model(),
    collection_name=str(self.knowledge_name),
    distance_strategy=DistanceStrategy.MAX_INNER_PRODUCT,
    connection=PostgresqlVectorStorageKnowledgeService.engine,
    use_jsonb=True,
)

Be sure to import the correct DistanceStrategy class; otherwise the chosen method will not work.

Import Paths and Enum Definitions

For PGVector, import with:

from langchain_community.vectorstores.pgvector import DistanceStrategy

The enum provides three options:

class DistanceStrategy(str, enum.Enum):
    """Enumerator of the Distance strategies."""
    EUCLIDEAN = "l2"
    COSINE = "cosine"
    MAX_INNER_PRODUCT = "inner"

For Faiss, import with:

from langchain.vectorstores.utils import DistanceStrategy

Its enum includes additional strategies:

class DistanceStrategy(str, Enum):
    """Enumerator of the Distance strategies for calculating distances between vectors."""
    EUCLIDEAN_DISTANCE = "EUCLIDEAN_DISTANCE"
    MAX_INNER_PRODUCT = "MAX_INNER_PRODUCT"
    DOT_PRODUCT = "DOT_PRODUCT"
    JACCARD = "JACCARD"
    COSINE = "COSINE"

01 Euclidean Distance

Euclidean distance measures the straight‑line distance between two vectors by summing the squared differences of each dimension and taking the square root. Smaller values indicate higher similarity.

It is less suitable for high‑dimensional data due to the "curse of dimensionality". Typical scenarios include:

Machine learning algorithms such as K‑Nearest Neighbors and K‑means clustering.

Image processing for comparing feature vectors.

Recommendation systems when data is low‑dimensional and dense.

Any case requiring an absolute distance measure, e.g., anomaly detection.

02 Maximum Inner Product

Maximum inner product (or dot product) evaluates similarity by the magnitude of the vectors' inner product, reflecting how well their directions align. Larger values mean higher similarity, making it useful for large‑scale high‑dimensional data.

Common applications include:

Semantic search in NLP, comparing word or sentence embeddings.

Attention mechanisms in neural networks, where queries and keys are matched via inner product.

Recommendation systems measuring match strength between user preferences and item features.

03 Cosine Similarity

Cosine similarity normalizes the inner product by the vectors' lengths, focusing solely on direction. Its value ranges from –1 to 1; values near 1 indicate high similarity, –1 indicates opposite direction, and 0 denotes orthogonality.

1  : vectors point in the same direction
-1 : vectors point in opposite directions
0  : vectors are orthogonal (no similarity)

Because it ignores vector magnitude, cosine similarity works well with high‑dimensional sparse data such as TF‑IDF, bag‑of‑words, or BERT embeddings. Typical use cases are:

Text similarity calculations for documents or sentences.

Recommendation engines matching user interest vectors to item feature vectors.

Image similarity assessment in feature‑vector based image retrieval.

LangChain RAG vector similarity distance metrics AI embeddings

Written by

Ma Wei Says

Follow me! Discussing software architecture and development, AIGC and AI Agents... Sometimes sharing insights on IT professionals' life experiences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.