Tagged articles
21 articles
Page 1 of 1
DeepHub IMBA
DeepHub IMBA
Apr 30, 2026 · Artificial Intelligence

Why Real RAG Systems Need Both BM25 and Vector Search

The article analyzes how BM25 excels at exact token matching while vector embeddings capture semantic intent, explains their distinct failure modes, and shows that a hybrid retriever—combined with metadata filtering, proper chunking, and reciprocal rank fusion—delivers the most reliable results for RAG pipelines.

BM25EmbeddingHybrid Retrieval
0 likes · 17 min read
Why Real RAG Systems Need Both BM25 and Vector Search
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Apr 9, 2026 · Databases

Master PostgreSQL Full-Text Search: From Basics to Advanced Chinese Tokenization

This article explains PostgreSQL's native full‑text search, its core concepts of tsvector and tsquery, demonstrates how to use built‑in functions and operators, compares built‑in, zhparser, and pg_search extensions for Chinese tokenization, and provides best‑practice tips for indexing, triggers, and performance optimization.

BM25Chinese TokenizationFull‑Text Search
0 likes · 14 min read
Master PostgreSQL Full-Text Search: From Basics to Advanced Chinese Tokenization
AI Engineer Programming
AI Engineer Programming
Apr 8, 2026 · Artificial Intelligence

TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG

The article explains how TF‑IDF and BM25 compute term importance, compares their strengths and weaknesses, and shows how these sparse retrieval methods integrate with dense retrieval techniques such as DPR, SPLADE, and ColBERT in Retrieval‑Augmented Generation systems, concluding with a hybrid retrieval decision matrix.

BM25Hybrid RetrievalRAG
0 likes · 14 min read
TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 7, 2026 · Artificial Intelligence

Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Experiments

This article dissects the shortcomings of pure vector retrieval, explains how BM25 complements it, compares weighted‑sum and Reciprocal Rank Fusion (RRF) strategies, shows experimental results that identify optimal weight and k values, and provides practical engineering tips for deploying hybrid search in RAG systems.

BM25Hybrid RetrievalParameter Tuning
0 likes · 24 min read
Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Experiments
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 26, 2026 · Artificial Intelligence

Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Gains

This article explains why combining BM25 with dense vector search using Reciprocal Rank Fusion (RRF) improves recall for both exact‑term and semantic queries in a financial‑insurance document corpus, details the underlying algorithms, parameter choices such as k=60, provides Python implementations, and shows measurable performance gains in production.

BM25FAISSHybrid Retrieval
0 likes · 28 min read
Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Gains
Open Source Tech Hub
Open Source Tech Hub
Mar 25, 2026 · Artificial Intelligence

How to Build Hybrid Vector and Full‑Text Search with PHPVector in PHP 8.2

This guide introduces PHPVector, a pure‑PHP vector database that combines HNSW‑based approximate nearest‑neighbor search with BM25 full‑text ranking, showing installation, document insertion, vector and text queries, hybrid ranking modes, configuration options, distance metrics, tuning tips, and persistence mechanisms.

AIBM25HNSW
0 likes · 10 min read
How to Build Hybrid Vector and Full‑Text Search with PHPVector in PHP 8.2
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Mar 11, 2026 · Backend Development

How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch

This article walks through building a practical Elasticsearch demo that lets users type a single query like “nearby clean coffee shop” and get results by combining dense‑vector semantic search, geo filtering, BM25, and a hybrid RRF‑style ranking, with both LLM‑based structuring and a fallback hash‑based embedding.

BM25FlaskHybrid Search
0 likes · 10 min read
How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch
Architecture and Beyond
Architecture and Beyond
Feb 1, 2026 · Artificial Intelligence

5 High‑ROI Strategies to Supercharge RAG Retrieval Performance

This article outlines five practical engineering strategies—multi‑vector retrieval, manual splitting and labeling, scalar enhancement, context augmentation, and dense‑sparse vector integration—that together address common RAG retrieval bottlenecks and dramatically improve recall stability and answer quality.

BM25EngineeringLLM
0 likes · 17 min read
5 High‑ROI Strategies to Supercharge RAG Retrieval Performance
ITPUB
ITPUB
Dec 29, 2025 · Databases

Boost PostgreSQL Full‑Text Search 3× Faster with VectorChord‑BM25

VectorChord‑BM25 is a PostgreSQL extension that adds native BM25 ranking and tokenization, delivering up to three‑fold query‑per‑second improvements over ElasticSearch while maintaining comparable relevance scores, and includes detailed installation, usage examples, and performance analysis.

BM25Database ExtensionFull‑Text Search
0 likes · 17 min read
Boost PostgreSQL Full‑Text Search 3× Faster with VectorChord‑BM25
Tech Freedom Circle
Tech Freedom Circle
Nov 5, 2025 · Artificial Intelligence

Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF

This article provides a comprehensive technical guide to Elasticsearch’s core retrieval models—BM25 and TF‑IDF—while detailing modern vector‑based search using dense_vector, kNN, L2 and cosine distances, and demonstrates how to combine keyword and semantic results through hybrid search and Reciprocal Rank Fusion (RRF) with practical configuration examples.

BM25ElasticsearchRRF
0 likes · 42 min read
Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF
JavaEdge
JavaEdge
Oct 2, 2024 · Artificial Intelligence

Boost RAG Retrieval Accuracy with Contextual Embeddings and BM25

This article presents a contextual retrieval technique that combines contextual embeddings and contextual BM25 to reduce RAG miss rates by up to 67%, explains the underlying methods, implementation steps, cost considerations, experimental results, and practical deployment guidance.

AIBM25Contextual Retrieval
0 likes · 17 min read
Boost RAG Retrieval Accuracy with Contextual Embeddings and BM25
政采云技术
政采云技术
May 12, 2022 · Fundamentals

Understanding Lucene Query Process and Core Principles

This article explains Lucene's query types, the step‑by‑step query execution flow—including entry, rewrite, weight creation, scoring, and result collection—while providing code examples and performance considerations to help developers troubleshoot and optimize search performance.

BM25ElasticsearchJava
0 likes · 15 min read
Understanding Lucene Query Process and Core Principles
DeWu Technology
DeWu Technology
Dec 4, 2020 · Fundamentals

Introduction to Search Engine Technology and Information Retrieval

The article surveys core search‑engine technology—document hierarchy, flat and vertical inverted indexes, query operators for building and merging score lists, and ranking models from Boolean and BM25 to language‑model approaches like Indri—providing a foundational overview of information retrieval.

BM25information retrievalinverted index
0 likes · 14 min read
Introduction to Search Engine Technology and Information Retrieval
Tencent Cloud Developer
Tencent Cloud Developer
Jul 22, 2020 · Backend Development

Practical Optimization of Elasticsearch Search Ranking

The article explains how to systematically improve Elasticsearch search relevance by fine‑tuning Query DSL with filters, phrase matching, and boosts, incorporating static scoring via function_score, adjusting BM25 similarity parameters, and using diagnostics like _explain to iteratively achieve higher ranking quality.

BM25BoostElasticsearch
0 likes · 17 min read
Practical Optimization of Elasticsearch Search Ranking
360 Quality & Efficiency
360 Quality & Efficiency
Nov 15, 2019 · Information Security

Improving Product Quality through Code Vulnerability Inspection and Deep Code‑Search Techniques

The article explains how static source‑code scanning, binary analysis, and advanced code‑search technologies—including incremental indexing, deduplication, real‑time Sphinx indexing, and BM25 ranking—can be combined to detect and remediate product‑level vulnerabilities early, thereby significantly raising software quality and reducing risk.

BM25Sphinxcode search
0 likes · 13 min read
Improving Product Quality through Code Vulnerability Inspection and Deep Code‑Search Techniques