How to Use Vektor: A Zero‑RAM Native PHP Vector Database for Fast ANN Search
Vektor is a high‑performance, pure‑PHP vector database that stores data on disk, uses the HNSW algorithm for approximate nearest‑neighbor search, and operates with zero RAM overhead, offering both an embeddable library and a standalone HTTP API server with simple installation and configuration steps.
Overview
Vektor is a high‑performance, pure‑file, embedded vector database written entirely in native PHP. It implements a Zero‑RAM Overhead design: the dataset is never fully loaded into memory, and all data is accessed directly from disk. Approximate nearest‑neighbor (ANN) queries are powered by the Hierarchical Navigable Small World (HNSW) graph algorithm.
Features
Pure PHP implementation : No external dependencies or C extensions; works on any PHP 8.2+ runtime.
Zero memory overhead : Data is read from binary files; memory usage stays constant regardless of dataset size.
HNSW index : Graph‑based structure provides fast ANN search.
Binary storage : Vectors, graph connections and metadata are stored in compact binary files.
Embedded or server mode : Use as a PHP library or run as an independent HTTP API server.
Thread‑safe : File‑locking (flock) guarantees safe concurrent reads and writes.
Cosine similarity : Optimized distance metric for high‑dimensional embeddings (default 1536 dimensions, configurable).
System Requirements
PHP 8.2 or higher
Composer for dependency management
Installation
Composer (recommended for existing projects): composer require centamiv/vektor Standalone API server:
git clone https://github.com/centamiv/vektor.git
cd vektor
composer install --no-dev
mkdir -p data
chmod -R 775 dataConfiguration
Server mode reads settings from a .env file.
Copy the example file: cp .env.example .env Edit .env to set the API token and vector dimensions:
# .env
VEKTOR_API_TOKEN=your_secure_random_string_here
VEKTOR_DIMENSIONS=1536VEKTOR_API_TOKEN : When set, all endpoints except /up require an Authorization: Bearer <token> header. Leave empty for a public API.
VEKTOR_DIMENSIONS : Defines the vector dimensionality (default 1536). Changing this value requires clearing the data/ directory and re‑initialising.
Usage Options
1️⃣ HTTP API Server
Start the server (development or testing): php -S 0.0.0.0:8000 -t public If an API token is configured, include it in the request header:
Authorization: Bearer your_secure_random_string_hereAPI Endpoints
GET /up – Health check (no authentication). Returns {"status": "up"}.
GET /info – Returns database statistics such as file size, record count and dimensions.
POST /insert – Inserts a vector. Example payload:
{
"id": "my-doc-id",
"vector": [0.1, 0.2, 0.3, ...],
"metadata": {
"source": "docs/intro.md",
"chunk": 3
}
}POST /search – Performs a vector search. Example payload:
{
"vector": [0.1, 0.2, 0.3, ...],
"k": 5,
"include_metadata": true,
"include_vector": false
}Optional parameters: include_vector, include_metadata.
POST /delete – Deletes a vector by its ID.
POST /optimize – Triggers database optimisation (cleans soft‑deleted space and rebuilds the index).
2️⃣ Embedded Library (Direct PHP Usage)
use Centamiv\Vektor\Services\Indexer;
use Centamiv\Vektor\Services\Searcher;
use Centamiv\Vektor\Services\Optimizer;
use Centamiv\Vektor\Core\Config;
// Optional: customise data directory and dimensions (must be set before initialisation)
Config::setDataDir(__DIR__ . '/my-data');
Config::setDimensions(768);
// Insert a vector
$indexer = new Indexer();
$indexer->insert(
"article-001",
[0.021, -0.534, 0.789, /* ... 1536‑dimensional vector */],
["category" => "php", "lang" => "zh"]
);
// Search
$searcher = new Searcher();
$results = $searcher->search($queryVector, 10, includeMetadata: true);
// Optimise (clean up space)
$optimizer = new Optimizer();
$optimizer->run();Database File Structure (Core Design)
vector.bin– Append‑only storage of raw floating‑point vectors. meta.bin – Disk‑based binary search tree mapping IDs to file offsets. payload.bin – Append‑only storage of serialized metadata (JSON). graph.bin – HNSW graph structure used for fast navigation and ANN search.
All files use a strict binary layout and file‑locking to achieve true zero‑RAM overhead.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
