Tagged articles

BM25

25 articles · Page 1 of 1

Jun 17, 2026 · Artificial Intelligence

What Is Hybrid Search in RAG and Why Choose It Over Pure Vector Retrieval?

Hybrid search combines dense vector retrieval with sparse keyword search, using RRF fusion and optional reranking, to overcome the limitations of each method—semantic understanding versus exact matching—making it the production‑grade standard for RAG systems by 2025‑2026.

BM25ElasticsearchHybrid Search

0 likes · 19 min read

What Is Hybrid Search in RAG and Why Choose It Over Pure Vector Retrieval?

Code Mala Tang

Jun 5, 2026 · Artificial Intelligence

Give Your Notes a Memory Layer with QMD: 3‑Command, 26k‑Star Local Search Engine

QMD is an open‑source, MIT‑licensed local search engine written in TypeScript that combines BM25, vector embeddings via a GGUF model, and an LLM reranker, allowing natural‑language queries over thousands of markdown files without network calls, and can be installed with just three commands.

BM25LLM rerankTypeScript

0 likes · 13 min read

Give Your Notes a Memory Layer with QMD: 3‑Command, 26k‑Star Local Search Engine

DeepHub IMBA

May 27, 2026 · Artificial Intelligence

Testing Four Non‑Vector RAG Approaches: BM25, GraphRAG, Tree Search, and Agentic Search

The article evaluates four non‑vector Retrieval‑Augmented Generation methods—BM25 lexical search, GraphRAG graph traversal, Tree‑Search document navigation, and an Agentic search loop—using a small JSON‑based corpus, showing each method’s strengths, weaknesses, and when to combine them for production‑grade retrieval.

Agentic SearchBM25GraphRAG

0 likes · 12 min read

Testing Four Non‑Vector RAG Approaches: BM25, GraphRAG, Tree Search, and Agentic Search

Su San Talks Tech

May 25, 2026 · Artificial Intelligence

Mastering RAG: Chunking, Embeddings, BM25 & Multi‑Index Retrieval in Python

This tutorial explains Retrieval‑Augmented Generation (RAG) from fundamentals to a full pipeline, covering text chunking strategies, VoyageAI embeddings, vector‑store implementation, BM25 lexical search, and a multi‑index retriever that fuses semantic and lexical results with Reciprocal Rank Fusion.

BM25ChunkingPython

0 likes · 48 min read

Mastering RAG: Chunking, Embeddings, BM25 & Multi‑Index Retrieval in Python

AI Engineer Programming

May 15, 2026 · Artificial Intelligence

Hybrid Retrieval in RAG: Combining BM25 Precision with Dense Vector Semantics

The article examines why pure vector retrieval in RAG lacks lexical precision and traceable relevance scores, explains BM25's strengths, and presents hybrid retrieval architectures—including RRF and linear combination fusion—as well as the trade‑offs of externalizing the fusion process.

BM25Hybrid SearchInformation Retrieval

0 likes · 9 min read

Hybrid Retrieval in RAG: Combining BM25 Precision with Dense Vector Semantics

DeepHub IMBA

Apr 30, 2026 · Artificial Intelligence

Why Real RAG Systems Need Both BM25 and Vector Search

The article analyzes how BM25 excels at exact token matching while vector embeddings capture semantic intent, explains their distinct failure modes, and shows that a hybrid retriever—combined with metadata filtering, proper chunking, and reciprocal rank fusion—delivers the most reliable results for RAG pipelines.

BM25EmbeddingHybrid Retrieval

0 likes · 17 min read

Why Real RAG Systems Need Both BM25 and Vector Search

James' Growth Diary

Apr 22, 2026 · Artificial Intelligence

Boost RAG Performance: Chunking Strategies, Rerank, and Hybrid Retrieval Explained

This article breaks down why RAG pipelines often underperform and shows how proper chunking, overlap settings, hybrid vector‑plus‑BM25 retrieval, and a Rerank step can dramatically improve recall and precision, with concrete code examples and tuning tips.

BM25ChunkingHybrid Retrieval

0 likes · 14 min read

Boost RAG Performance: Chunking Strategies, Rerank, and Hybrid Retrieval Explained

360 Zhihui Cloud Developer

Apr 9, 2026 · Databases

Master PostgreSQL Full-Text Search: From Basics to Advanced Chinese Tokenization

This article explains PostgreSQL's native full‑text search, its core concepts of tsvector and tsquery, demonstrates how to use built‑in functions and operators, compares built‑in, zhparser, and pg_search extensions for Chinese tokenization, and provides best‑practice tips for indexing, triggers, and performance optimization.

BM25Chinese TokenizationFull-Text Search

0 likes · 14 min read

Master PostgreSQL Full-Text Search: From Basics to Advanced Chinese Tokenization

AI Engineer Programming

Apr 8, 2026 · Artificial Intelligence

TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG

The article explains how TF‑IDF and BM25 compute term importance, compares their strengths and weaknesses, and shows how these sparse retrieval methods integrate with dense retrieval techniques such as DPR, SPLADE, and ColBERT in Retrieval‑Augmented Generation systems, concluding with a hybrid retrieval decision matrix.

BM25Hybrid RetrievalInformation Retrieval

0 likes · 14 min read

TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG

Wu Shixiong's Large Model Academy

Apr 7, 2026 · Artificial Intelligence

Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Experiments

This article dissects the shortcomings of pure vector retrieval, explains how BM25 complements it, compares weighted‑sum and Reciprocal Rank Fusion (RRF) strategies, shows experimental results that identify optimal weight and k values, and provides practical engineering tips for deploying hybrid search in RAG systems.

BM25Hybrid RetrievalRAG Systems

0 likes · 24 min read

Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Experiments

Wu Shixiong's Large Model Academy

Mar 26, 2026 · Artificial Intelligence

Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Gains

This article explains why combining BM25 with dense vector search using Reciprocal Rank Fusion (RRF) improves recall for both exact‑term and semantic queries in a financial‑insurance document corpus, details the underlying algorithms, parameter choices such as k=60, provides Python implementations, and shows measurable performance gains in production.

BM25FAISSHybrid Retrieval

0 likes · 28 min read

Why Hybrid Retrieval Beats Pure Vector Search: BM25, RRF, and Real‑World Gains

Open Source Tech Hub

Mar 25, 2026 · Artificial Intelligence

How to Build Hybrid Vector and Full‑Text Search with PHPVector in PHP 8.2

This guide introduces PHPVector, a pure‑PHP vector database that combines HNSW‑based approximate nearest‑neighbor search with BM25 full‑text ranking, showing installation, document insertion, vector and text queries, hybrid ranking modes, configuration options, distance metrics, tuning tips, and persistence mechanisms.

AIBM25HNSW

0 likes · 10 min read

How to Build Hybrid Vector and Full‑Text Search with PHPVector in PHP 8.2

Mingyi World Elasticsearch

Mar 11, 2026 · Backend Development

How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch

This article walks through building a practical Elasticsearch demo that lets users type a single query like “nearby clean coffee shop” and get results by combining dense‑vector semantic search, geo filtering, BM25, and a hybrid RRF‑style ranking, with both LLM‑based structuring and a fallback hash‑based embedding.

BM25FlaskHybrid Search

0 likes · 10 min read

How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch

Architecture and Beyond

Feb 1, 2026 · Artificial Intelligence

5 High‑ROI Strategies to Supercharge RAG Retrieval Performance

This article outlines five practical engineering strategies—multi‑vector retrieval, manual splitting and labeling, scalar enhancement, context augmentation, and dense‑sparse vector integration—that together address common RAG retrieval bottlenecks and dramatically improve recall stability and answer quality.

BM25LLMRAG

0 likes · 17 min read

5 High‑ROI Strategies to Supercharge RAG Retrieval Performance

AI Insight Log

Jan 15, 2026 · Artificial Intelligence

How Claude Code’s New MCP Tool Search Slashes Tokens and Solves Context Explosion

Claude Code introduces MCP Tool Search, a lazy‑loading mechanism that dynamically loads only needed tools, cutting token usage by over 67,000 tokens in large MCP setups, preventing context bloat, improving performance, and offering developers regex and BM25 search options with defer_loading support.

BM25Claude CodeContext Management

0 likes · 6 min read

How Claude Code’s New MCP Tool Search Slashes Tokens and Solves Context Explosion

ITPUB

Dec 29, 2025 · Databases

Boost PostgreSQL Full‑Text Search 3× Faster with VectorChord‑BM25

VectorChord‑BM25 is a PostgreSQL extension that adds native BM25 ranking and tokenization, delivering up to three‑fold query‑per‑second improvements over ElasticSearch while maintaining comparable relevance scores, and includes detailed installation, usage examples, and performance analysis.

BM25Database ExtensionFull-Text Search

0 likes · 17 min read

Boost PostgreSQL Full‑Text Search 3× Faster with VectorChord‑BM25

Tech Freedom Circle

Nov 5, 2025 · Artificial Intelligence

Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF

This article provides a comprehensive technical guide to Elasticsearch’s core retrieval models—BM25 and TF‑IDF—while detailing modern vector‑based search using dense_vector, kNN, L2 and cosine distances, and demonstrates how to combine keyword and semantic results through hybrid search and Reciprocal Rank Fusion (RRF) with practical configuration examples.

BM25ElasticsearchRRF

0 likes · 42 min read

Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF

JD Tech Talk

Nov 26, 2024 · Artificial Intelligence

Design and Implementation of an Automated Logistics QA Bot Using Retrieval, Rerank, and Data Augmentation Techniques

This article describes a low‑cost, privacy‑preserving chatbot for logistics that combines data cleaning, large‑model‑based data augmentation, BM25 and vector retrieval, a DNN rerank model, and LLM‑driven answer rewriting to deliver accurate, compliant automated responses.

AIBM25QA bot

0 likes · 11 min read

Design and Implementation of an Automated Logistics QA Bot Using Retrieval, Rerank, and Data Augmentation Techniques

Baobao Algorithm Notes

Oct 17, 2024 · Artificial Intelligence

How Contextual Retrieval Slashes RAG Failures by Up to 67% and Cuts Costs

Anthropic’s Contextual Retrieval augments traditional RAG with contextual embeddings and BM25, reducing retrieval failure rates by 49% (up to 67% with reranking), improving accuracy across domains, and lowering latency and cost through Claude’s prompt‑caching feature.

AIBM25Contextual Retrieval

0 likes · 11 min read

How Contextual Retrieval Slashes RAG Failures by Up to 67% and Cuts Costs

JavaEdge

Oct 2, 2024 · Artificial Intelligence

Boost RAG Retrieval Accuracy with Contextual Embeddings and BM25

This article presents a contextual retrieval technique that combines contextual embeddings and contextual BM25 to reduce RAG miss rates by up to 67%, explains the underlying methods, implementation steps, cost considerations, experimental results, and practical deployment guidance.

AIBM25Contextual Retrieval

0 likes · 17 min read

Boost RAG Retrieval Accuracy with Contextual Embeddings and BM25

政采云技术

May 12, 2022 · Fundamentals

Understanding Lucene Query Process and Core Principles

This article explains Lucene's query types, the step‑by‑step query execution flow—including entry, rewrite, weight creation, scoring, and result collection—while providing code examples and performance considerations to help developers troubleshoot and optimize search performance.

BM25ElasticsearchJava

0 likes · 15 min read

Understanding Lucene Query Process and Core Principles

TiPaiPai Technical Team

May 8, 2021 · Backend Development

How to Build a High‑Precision Exam Question Search Engine with BM25 and LTR

This article explains the architecture and key algorithms behind a specialized exam‑question search engine, covering query parsing, BM25‑based relevance scoring, text‑image hybrid retrieval, Learning‑to‑Rank models, and practical optimizations for long queries and large K‑12 datasets.

BM25Learning-to-RankRanking

0 likes · 12 min read

How to Build a High‑Precision Exam Question Search Engine with BM25 and LTR

DeWu Technology

Dec 4, 2020 · Fundamentals

Introduction to Search Engine Technology and Information Retrieval

The article surveys core search‑engine technology—document hierarchy, flat and vertical inverted indexes, query operators for building and merging score lists, and ranking models from Boolean and BM25 to language‑model approaches like Indri—providing a foundational overview of information retrieval.

BM25Information RetrievalSearch Engine

0 likes · 14 min read

Introduction to Search Engine Technology and Information Retrieval

Tencent Cloud Developer

Jul 22, 2020 · Backend Development

Practical Optimization of Elasticsearch Search Ranking

The article explains how to systematically improve Elasticsearch search relevance by fine‑tuning Query DSL with filters, phrase matching, and boosts, incorporating static scoring via function_score, adjusting BM25 similarity parameters, and using diagnostics like _explain to iteratively achieve higher ranking quality.

BM25BoostElasticsearch

0 likes · 17 min read

Practical Optimization of Elasticsearch Search Ranking

360 Quality & Efficiency

Nov 15, 2019 · Information Security

Improving Product Quality through Code Vulnerability Inspection and Deep Code‑Search Techniques

The article explains how static source‑code scanning, binary analysis, and advanced code‑search technologies—including incremental indexing, deduplication, real‑time Sphinx indexing, and BM25 ranking—can be combined to detect and remediate product‑level vulnerabilities early, thereby significantly raising software quality and reducing risk.

BM25Code searchSphinx

0 likes · 13 min read