Boost PHP AI Performance with MemVector: In‑Process Vector Search & Reranking
MemVector is a high‑performance PHP extension that brings native AI capabilities—embedding, vector similarity search, and cross‑encoder reranking—directly into the PHP process, eliminating external services and delivering sub‑10 ms latency for full RAG pipelines on commodity CPUs.
Overview
MemVector is a high‑performance PHP extension that provides native AI infrastructure for PHP. It implements vector similarity search, text embedding, and cross‑encoder reranking directly inside the PHP process using C++17 and AVX2 SIMD acceleration. No external vector database, embedding API, or network round‑trip is required; typical operations finish within 10 ms.
Quick Start
Local GGUF embedding model (requires --with-llama):
// Using local embedding (requires --with-llama)
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$store = new MemVectorStore(null, ['dimensions' => $emb->dimensions()]);
$store->set('php', $emb->embed('PHP is a server‑side scripting language'), 'lang');
$store->set('python', $emb->embed('Python is used for machine learning'), 'lang');
$store->set('gravity', $emb->embed('Gravity pulls objects toward the earth'), 'science');
$store->set('dna', $emb->embed('DNA encodes genetic information'), 'science');
$results = $store->search($emb->embed('programming languages'), 2);
// $results => [['key' => 'php', 'score' => 0.82, 'metadata' => 'lang'],
// ['key' => 'python', 'score' => 0.79, 'metadata' => 'lang']]Two‑stage retrieval (vector search + cross‑encoder reranking):
// Two‑stage retrieval (requires --with-llama)
$rr = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
$candidates = $store->search($emb->embed('programming languages'), 50);
$results = $rr->rerank('programming languages', $candidates, 5);Custom vectors without llama.cpp:
$store = new MemVectorStore('/data/vectors', ['dimensions' => 1536]);
$store->set('doc_1', $openai_embedding, '{"title": "Introduction"}');
$results = $store->search($query_embedding, 10);Features
Key‑Value API : set(key, vector), get(key), delete(key), batchSet() (upsert supported)
Text Embedding : optional llama.cpp integration for local GGUF models
Cross‑Encoder Reranking : fast vector search followed by high‑precision reranking
Storage Modes : in‑memory, disk (mmap‑backed), shared memory (cross‑process)
HNSW Index : built on‑the‑fly for approximate nearest‑neighbor search
Vector Quantization : F16, Int8, binary, product quantization (PQ)
Distance Metrics : cosine, dot product, Euclidean, Manhattan
Lock‑Free Concurrency : uses std::atomic for thread‑safe reads/writes
AVX2 SIMD : accelerates cosine and dot‑product calculations
Multi‑Segment Architecture : fixed‑capacity segments grow automatically
JSONL Import/Export : dump() and load() for backup and migration
Memory Limits : caps total mmap usage across segments
Performance
Embedding Generation
Method Single‑call latency Cost
OpenAI API (text‑embedding‑3‑small) 50‑200 ms (network) $0.02 / 1M tokens
Cohere API (embed‑english‑v3.0) 50‑200 ms (network) $0.10 / 1M tokens
MemVector + local GGUF model 5‑15 ms (in‑process) FreeLocal embedding is 10‑40× faster than cloud APIs and incurs no per‑token charge.
Vector Search
Method Query latency Remarks
Pinecone / Qdrant / Weaviate (cloud) 10‑50 ms (network) Managed service
Pinecone / Qdrant / Weaviate (self‑hosted) 5‑20 ms (network) Separate process, TCP/gRPC overhead
PostgreSQL + pgvector 5‑50 ms (query + network) Shared DB, connection‑pool overhead
MemVector (in‑process) 0.1‑5 ms No network, no serializationFull RAG Pipeline (embed + search + rerank)
Method Total latency Network round‑trips
OpenAI embed + Pinecone search 100‑400 ms 2
OpenAI embed + Pinecone search + Cohere rerank 200‑600 ms 3
MemVector (all in‑process) 10‑30 ms 0Memory Consumption
Component RSS
MemVector extension (no model) ~1 MB
+ embedding model (all‑MiniLM‑L6‑v2, 24 MB) ~33 MB
+ rerank model (bge‑reranker‑v2‑m3, 636 MB) ~200 MB
+ 100 k vectors (384‑dim, f32) ~150 MB
+ 100 k vectors (384‑dim, int8 quant) ~40 MBModel weights are loaded via mmap() and shared across PHP‑FPM or OpenSwoole workers through the OS page cache.
Requirements
PHP 8.1+
C++17 compiler (GCC 7+ or Clang 5+)
Optional: llama.cpp for embedding and reranking
Build
phpize
./configure --enable-memvector
make
make testInstall llama.cpp (optional)
Linux – build from source:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=OFF
cmake --build build --config Release -j$(nproc)
sudo cmake --install build --prefix /usr/local
sudo ldconfigTo enable GPU acceleration replace -DGGML_CUDA=OFF with -DGGML_CUDA=ON (requires CUDA toolkit).
Build with llama.cpp support
phpize
./configure --enable-memvector --with-llama=/usr/local
make
make testIf llama.cpp is installed in a non‑standard location, specify --with-llama=DIR.
Model Downloads
Embedding model (recommended 24 MB, 384‑dim):
curl -L -o all-MiniLM-L6-v2-Q8_0.gguf \
https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.Q8_0.ggufReranking model (GGUF cross‑encoder):
curl -L -o bge-reranker-v2-m3-Q8_0.gguf \
https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q8_0.ggufConfiguration Options
--enable-memvector: enable the extension (required) --enable-memvector-avx2: AVX2 SIMD (auto‑detected by default) --with-llama[=DIR]: enable llama.cpp support for embedding and reranking
Runtime feature detection via compile‑time constants:
if (defined('MEMVECTOR_SHM')) {
// shared‑memory mode available
$store = new MemVectorStore('mystore', ['storage' => 'shm', 'dimensions' => 128]);
}
if (defined('MEMVECTOR_LLAMA')) {
// llama.cpp support compiled in
$emb = new MemVectorEmbedding('/path/to/embedding-model.gguf');
$rr = new MemVectorReranker('/path/to/reranker-model.gguf');
}PHP API
Class: MemVectorStore
Constructor __construct(?string $dir = null, ?array $options = null) – creates or opens a vector store. Storage mode is chosen by the storage option: memory, disk, or shm.
Methods
set(string $key, array $vector, ?string $metadata = null): bool– insert or replace a vector (upsert semantics). batchSet(array $batch): int – insert or replace multiple vectors; returns the number of successfully stored items. get(string $key): ?array – retrieve a stored vector and its metadata. delete(string $key): bool – remove a vector. search(array $queryVector, int $k): array – return the top‑k nearest keys with similarity scores.
Key constraints: unique string up to 63 characters. Vector must match the configured dimensionality. Optional metadata is a JSON string up to 4096 bytes.
Class: MemVectorEmbedding
Provides embed(string $text): array to obtain a vector from a local GGUF model (requires --with-llama).
Class: MemVectorReranker
Provides rerank(string $query, array $candidates, int $topK): array to perform cross‑encoder reranking of candidate vectors.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
