Artificial Intelligence 15 min read

Neural Search in Apache Solr: Dense Vector Fields, HNSW Graphs, and K‑Nearest Neighbor Implementation

This article explains how Apache Solr implements neural search using dense vector fields, K‑Nearest Neighbor algorithms, and Hierarchical Navigable Small World graphs, detailing the underlying Lucene support, configuration options, query syntax, and integration with AI‑driven vector representations.

Architects Research Society

Jul 24, 2023

Neural Search in Apache Solr: Dense Vector Fields, HNSW Graphs, and K‑Nearest Neighbor Implementation

Introduction

Sease, together with Apache Lucene/Solr PMC members Alessandro Benedetti and Elia Porciani, contributed the first milestone of neural search to the Apache Solr project, built on top of Apache Lucene's K‑Nearest Neighbor (KNN) search capabilities.

Neural Search Overview

Neural search, a product of the academic field of neural information retrieval, aims to improve the four core stages of search with neural network techniques: query representation, document representation, matching, and scoring.

Artificial Intelligence, Deep Learning, and Vector Representations

Recent advances in AI and deep learning enable the generation of dense vector embeddings for both queries and documents, which can be used for similarity search.

Dense Vector Representations

Traditional inverted indexes model text as sparse vectors, where most dimensions are zero. Dense vectors, in contrast, encode semantic meaning in a fixed‑size, fully populated vector, typically produced by models such as BERT.

Approximate Nearest Neighbor (ANN) Search

Exact distance computation between a query vector and every document vector is costly; ANN algorithms return results whose distance is at most a constant factor larger than the true nearest neighbor distance, providing near‑optimal recall with much lower latency.

Hierarchical Navigable Small World (HNSW) Graphs

Solr leverages HNSW graphs, an efficient ANN structure, to navigate high‑dimensional vector spaces. Each vector becomes a vertex connected to its nearest neighbors; the graph is built with hyper‑parameters controlling connections per layer and the number of layers.

Apache Lucene Implementation

The entry point is org.apache.lucene.document.KnnVectorField, which stores vector dimensions and the similarity function. During indexing, the field schema is propagated through org.apache.lucene.index.IndexingChain to org.apache.lucene.index.FieldInfo. The default codec is Lucene90HnswVectorsFormat, with the writer Lucene90HnswVectorsWriter constructing the HNSW graph via org.apache.lucene.util.hnsw.HnswGraphBuilder.

Apache Solr Implementation

Available from Solr 9.0, the first contribution adds a DenseVectorField type and a KNN query parser. Example field type definition:

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

The field supports indexing and storing dense float vectors (e.g., [1.0, 2.5, 3.7, 4.1]). Supported similarity functions are euclidean, dot_product, and cosine.

Custom Codec Parameters

Advanced users can specify a custom codec and HNSW hyper‑parameters in solrconfig.xml:

<codecFactory class="solr.SchemaCodecFactory"/>
<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="4" similarityFunction="cosine" codecFormat="Lucene90HnswVectorsFormat" hnswMaxConnections="10" hnswBeamWidth="40"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Indexing Vectors

Vectors can be indexed via JSON, XML, or SolrJ. Example JSON:

[{ "id": "1", "vector": [1.0, 2.5, 3.7, 4.1] },
 { "id": "2", "vector": [1.5, 5.5, 6.7, 65.1] }]

Example SolrJ code:

final SolrClient client = getSolrClient();
final SolrInputDocument d1 = new SolrInputDocument();
d1.setField("id", "1");
d1.setField("vector", Arrays.asList(1.0f, 2.5f, 3.7f, 4.1f));
final SolrInputDocument d2 = new SolrInputDocument();
d2.setField("id", "2");
d2.setField("vector", Arrays.asList(1.5f, 5.5f, 6.7f, 65.1f));
client.add(Arrays.asList(d1, d2));

KNN Query Parser

The KNN parser searches the DenseVectorField for the k nearest vectors to a target vector:

&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

It can be combined with filter queries, e.g.,

&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

When used as a re‑ranking query, the topK parameter determines how many first‑pass results are rescored using the vector similarity.

Conclusion

The article provides a comprehensive guide to implementing neural search in Apache Solr, covering the theoretical background, Lucene internals, Solr configuration, indexing formats, and query usage, while also listing community resources and promotional links.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

.ai Lucene HNSW kNN Apache Solr Dense Vectors Neural Search

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.