How to Build Hybrid Vector and Full‑Text Search with PHPVector in PHP 8.2

This guide introduces PHPVector, a pure‑PHP vector database that combines HNSW‑based approximate nearest‑neighbor search with BM25 full‑text ranking, showing installation, document insertion, vector and text queries, hybrid ranking modes, configuration options, distance metrics, tuning tips, and persistence mechanisms.

Open Source Tech Hub
Open Source Tech Hub
Open Source Tech Hub
How to Build Hybrid Vector and Full‑Text Search with PHPVector in PHP 8.2

Overview

PHPVector is a pure‑PHP vector database that implements Hierarchical Navigable Small World (HNSW) for approximate nearest‑neighbor search and BM25 for full‑text ranking. The two engines can be combined into a hybrid search pipeline.

Requirements

PHP 8.2 or newer

No external PHP extensions required for core functionality

Optional ext‑pcntl to enable asynchronous document writes and reduce insertion latency

Installation

composer require ezimuel/phpvector

Quick Start

1. Insert Documents

A Document holds a dense embedding vector, optional raw text (used by BM25), and arbitrary metadata. The id field is optional; if omitted a random UUID v4 is generated.

use PHPVector\Document;
use PHPVector\VectorDatabase;

$db = new VectorDatabase();

$db->addDocuments([
    new Document(
        id: 1,
        vector: [0.12, 0.85, 0.44, 0.67],
        text: 'PHP vector database with HNSW index',
        metadata: ['url' => 'https://example.com/1', 'lang' => 'en']
    ),
    new Document(
        id: 2,
        vector: [0.91, 0.23, 0.78, 0.05],
        text: 'Approximate nearest neighbour search in PHP',
        metadata: ['url' => 'https://example.com/2', 'lang' => 'en']
    ),
    new Document(
        id: 3,
        vector: [0.33, 0.61, 0.19, 0.88],
        text: 'BM25 full‑text ranking algorithm explained',
        metadata: ['url' => 'https://example.com/3', 'lang' => 'en']
    ),
    // No id → UUID v4 is generated automatically
    new Document(
        vector: [0.55, 0.42, 0.71, 0.30],
        text: 'Hybrid search with Reciprocal Rank Fusion'
    ),
]);

2. Vector Search

Find the k most similar documents to a query vector using the HNSW index.

$queryVector = [0.10, 0.80, 0.50, 0.60];
$results = $db->vectorSearch(vector: $queryVector, k: 2);
foreach ($results as $result) {
    echo sprintf("[%d] score=%.4f  %s
", $result->rank, $result->score, $result->document->metadata['url']);
}
// Example output:
// [1] score=0.9987  https://example.com/1
// [2] score=0.8341  https://example.com/3

3. Full‑Text Search

Rank documents by BM25 relevance to a textual query.

$results = $db->textSearch(query: 'nearest neighbour PHP', k: 2);
foreach ($results as $result) {
    echo sprintf("[%d] score=%.4f  %s
", $result->rank, $result->score, $result->document->metadata['url']);
}
// Example output:
// [1] score=1.2430  https://example.com/2
// [2] score=0.8761  https://example.com/1

4. Hybrid Search

Combine vector similarity scores and BM25 scores into a single ranking list.

Reciprocal Rank Fusion (RRF)

RRF is rank‑based, independent of score scales, and requires no parameter tuning.

use PHPVector\HybridMode;

$results = $db->hybridSearch(
    vector: $queryVector,
    text: 'vector database PHP',
    k: 3,
    mode: HybridMode::RRF,
);
foreach ($results as $result) {
    echo sprintf("[%d] score=%.4f  %s
", $result->rank, $result->score, $result->document->metadata['url']);
}

Weighted Combination

Normalize both scores to the [0, 1] range and apply explicit weights.

$results = $db->hybridSearch(
    vector: $queryVector,
    text: 'vector database PHP',
    k: 3,
    mode: HybridMode::Weighted,
    vectorWeight: 0.7,
    textWeight: 0.3,
);

Configuration

Both HNSW and BM25 engines are fully configurable via objects passed to the VectorDatabase constructor.

use PHPVector\BM25\Config as BM25Config;
use PHPVector\BM25\SimpleTokenizer;
use PHPVector\Distance;
use PHPVector\HNSW\Config as HNSWConfig;
use PHPVector\VectorDatabase;

$db = new VectorDatabase(
    hnswConfig: new HNSWConfig(
        M: 16,                     // max connections per node per layer (higher → better recall, more memory)
        efConstruction: 200,       // construction beam width (higher → higher graph quality, slower inserts)
        efSearch: 50,              // search beam width (higher → better recall, slower queries)
        distance: Distance::Cosine, // Cosine | Euclidean | DotProduct | Manhattan
        useHeuristic: true        // diversified neighbor selection (recommended)
    ),
    bm25Config: new BM25Config(
        k1: 1.5,   // term‑frequency saturation (recommended 1.2–2.0)
        b: 0.75    // length normalization (0 = none, 1 = full)
    ),
    tokenizer: new SimpleTokenizer(
        stopWords: SimpleTokenizer::DEFAULT_STOP_WORDS,
        minTokenLength: 2,
    ),
);

Distance Metrics

Distance::Cosine – best for text embeddings and normalized vectors

Distance::Euclidean – suitable for raw, unnormalized vectors

Distance::DotProduct – works with unit‑normalized vectors and is faster than cosine

Distance::Manhattan – robust to outliers, ideal for sparse vectors

HNSW Tuning Cheat Sheet

Increase recall – raise efSearch or efConstruction Speed up queries – lower efSearch Reduce memory usage – lower M Better graph for clustered data – keep

useHeuristic: true

Persistence

PHPVector stores each database in a dedicated directory containing the HNSW graph, BM25 index, and per‑document files. This layout provides low memory usage on load and low insertion latency.

Folder Structure

/var/data/mydb/
  meta.json        // distance metric, dimension, document‑ID map
  hnsw.bin         // HNSW graph (vectors + connections)
  bm25.bin         // BM25 inverted index
  docs/
    0.bin          // document 0 (id, text, metadata)
    1.bin          // document 1
    …

Saving

Pass a path argument to the constructor to enable persistence. Each call to addDocument() writes a document file (asynchronously if ext‑pcntl is available). Call save() once to flush the HNSW graph and BM25 index to disk.

use PHPVector\Document;
use PHPVector\VectorDatabase;

$db = new VectorDatabase(path: '/var/data/mydb');
$db->addDocuments([
    new Document(id: 1, vector: [0.12, 0.85, 0.44], text: 'PHP vector search', metadata: ['source' => 'blog']),
    new Document(id: 2, vector: [0.91, 0.23, 0.78], text: 'Approximate nearest neighbour'),
    // ... insert thousands of documents
]);
$db->save();

Loading

Use VectorDatabase::open() to load a previously saved database.

$db = VectorDatabase::open('/var/data/mydb');
AIVector SearchHNSWBM25PHPHybrid Search
Open Source Tech Hub
Written by

Open Source Tech Hub

Sharing cutting-edge internet technologies and practical AI resources.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.