Tagged articles

ANN

30 articles · Page 1 of 1

Jun 17, 2026 · Artificial Intelligence

Why Using MySQL for RAG Leads to a Brutal Search Pitfall—and How Vector DB + ANN Saves You

The article explains why RAG systems cannot rely on MySQL for embedding storage, shows the O(n) brute‑force search latency for hundreds of thousands of chunks, and demonstrates how vector databases with ANN indexes such as HNSW or IVFFLAT provide millisecond‑level response, high recall, and scalable storage.

ANNHNSWRAG

0 likes · 19 min read

Why Using MySQL for RAG Leads to a Brutal Search Pitfall—and How Vector DB + ANN Saves You

AI Engineer Programming

May 30, 2026 · Artificial Intelligence

Should You Pre‑filter or Post‑filter in RAG Vector Search?

The article examines RAG vector retrieval filtering strategies, comparing pre‑filtering (filter before vector search) and post‑filtering (filter after ANN search), and introduces single‑stage filtering, discussing their principles, trade‑offs, suitable scenarios, and architectural implications for accuracy and performance.

ANNRAGmetadata filtering

0 likes · 15 min read

Should You Pre‑filter or Post‑filter in RAG Vector Search?

Linyb Geek Road

May 5, 2026 · Artificial Intelligence

Optimizing Retrieval and Generation Latency in High‑Concurrency RAG Agents

The article dissects latency in high‑concurrency RAG Agent pipelines, showing how retrieval, re‑ranking, and LLM generation each contribute milliseconds of delay, and presents system‑level tactics—from ANN index tuning and partitioned search to vLLM PagedAttention, continuous batching, speculative decoding, model quantization, routing, semantic caching, and pipeline parallelism—to dramatically cut end‑to‑end response time.

ANNLLMRAG

0 likes · 15 min read

Optimizing Retrieval and Generation Latency in High‑Concurrency RAG Agents

AI Engineer Programming

Apr 26, 2026 · Artificial Intelligence

From Bag‑of‑Words to Semantics: How Embeddings Turn Meaning into Numbers (Part 2)

The article explains how embedding techniques encode semantic information into numeric vectors, covering Word2Vec and GloVe fundamentals, BERT anisotropy, SimCSE contrastive learning, alignment and uniformity metrics, ANN index structures such as HNSW, IVF and PQ, Matryoshka representation learning, practical deployment challenges, and evaluation best practices.

ANNBERTEmbedding

0 likes · 23 min read

From Bag‑of‑Words to Semantics: How Embeddings Turn Meaning into Numbers (Part 2)

Senior Tony

Apr 11, 2026 · Databases

Why Vectors Need a Dedicated Database and How Milvus Solves It

This article explains what vectors are, why traditional relational databases struggle with high‑dimensional similarity queries, and how the open‑source Milvus vector database efficiently stores, indexes, and retrieves massive vectors for AI applications such as semantic search, image matching, and recommendation.

AI ApplicationsANNDatabases

0 likes · 5 min read

Why Vectors Need a Dedicated Database and How Milvus Solves It

Open Source Tech Hub

Mar 12, 2026 · Databases

How to Use Vektor: A Zero‑RAM Native PHP Vector Database for Fast ANN Search

Vektor is a high‑performance, pure‑PHP vector database that stores data on disk, uses the HNSW algorithm for approximate nearest‑neighbor search, and operates with zero RAM overhead, offering both an embeddable library and a standalone HTTP API server with simple installation and configuration steps.

ANNEmbedding SearchHNSW

0 likes · 7 min read

How to Use Vektor: A Zero‑RAM Native PHP Vector Database for Fast ANN Search

dbaplus Community

Dec 15, 2025 · Databases

Understanding Milvus Vector Indexes: Structures, Quantization, and Future Trends

This article explains the core concepts of vector database indexing, details the composition of Milvus indexes—including data structures, quantization methods, and specific algorithms like IVF, HNSW, DISKANN, PQ, RABITQ, PRQ, SCANN, AISAQ, and MINHASH_LSH—and offers speculation on future developments.

ANNHNSWMilvus

0 likes · 32 min read

Understanding Milvus Vector Indexes: Structures, Quantization, and Future Trends

Volcano Engine Developer Services

Dec 5, 2025 · Artificial Intelligence

Why Vectors Power Scalable AI Search and How S3 Vectors Redefines Storage

This article explains how high‑dimensional vectors enable semantic AI search, compares exact and approximate nearest‑neighbor algorithms, examines the challenges of large‑scale vector storage, and evaluates AWS S3 Vectors' architecture, pricing, and hybrid solutions for cost‑effective, high‑performance retrieval.

AI semanticsANNS3 Vectors

0 likes · 17 min read

Why Vectors Power Scalable AI Search and How S3 Vectors Redefines Storage

Xiaohongshu Tech REDtech

May 22, 2025 · Artificial Intelligence

Scalable Overload-Aware Graph-Based Index Construction for 10‑Billion‑Scale Vector Similarity Search (SOGAIC)

The paper introduces SOGAIC, a scalable overload‑aware graph‑based index construction system for billion‑scale vector similarity search that uses adaptive overlapping partitioning and load‑balanced distributed scheduling to cut construction time by 47.3% while maintaining high recall.

ANNDistributed SchedulingLarge Scale

0 likes · 13 min read

Scalable Overload-Aware Graph-Based Index Construction for 10‑Billion‑Scale Vector Similarity Search (SOGAIC)

StarRocks

Feb 11, 2025 · Databases

How StarRocks Supercharges Vector Search: 7× Faster Queries and 1/3 Cost

This article explains the principles and practical implementation of vector retrieval in StarRocks, covering approximate nearest‑neighbor algorithms, index design, query planning, performance optimizations, real‑world case studies, and future challenges, showing how query latency dropped from 15 seconds to 2 seconds while cutting costs to a third.

ANNHNSWIVFPQ

0 likes · 25 min read

How StarRocks Supercharges Vector Search: 7× Faster Queries and 1/3 Cost

Baidu Tech Salon

Nov 22, 2024 · Artificial Intelligence

How GPU‑Accelerated ANN Search Cuts Costs and Boosts Throughput in High‑Volume Retrieval

This article analyzes a GPU‑based approximate nearest neighbor (ANN) retrieval solution built on NVIDIA's RAFT library, detailing algorithm selection, offline indexing tricks, batch online search design, performance results on a 25‑million‑vector workload, and cost‑saving implications for large‑scale search services.

ANNGPUIVF_INT8

0 likes · 21 min read

How GPU‑Accelerated ANN Search Cuts Costs and Boosts Throughput in High‑Volume Retrieval

Baidu Geek Talk

Nov 20, 2024 · Artificial Intelligence

Boosting ANN Search with GPU: Inside RAFT’s IVF_INT8 Implementation

This article examines how Baidu and NVIDIA leveraged the open‑source RAFT library to build a GPU‑accelerated approximate nearest neighbor (ANN) retrieval system, detailing algorithm choices, offline indexing, online batch processing, performance results, and practical guidelines for deploying ANN on GPUs.

ANNGPUIVF_INT8

0 likes · 20 min read

Boosting ANN Search with GPU: Inside RAFT’s IVF_INT8 Implementation

JD Tech

May 17, 2024 · Artificial Intelligence

Optimizing JD Advertising Retrieval Platform: Balancing Compute, Data Scale, and Iterative Efficiency

The article details how JD's advertising retrieval platform tackles the core challenge of balancing limited compute resources with massive data by optimizing compute allocation, improving model scoring efficiency, and enhancing iteration speed through distributed execution graphs, adaptive algorithms, and platform‑level infrastructure improvements.

ANNAdvertisingScalable Architecture

0 likes · 24 min read

Optimizing JD Advertising Retrieval Platform: Balancing Compute, Data Scale, and Iterative Efficiency

Volcano Engine Developer Services

Apr 24, 2024 · Databases

How Vector Search Powers LLMs: Inside ByteHouse’s High‑Performance Vector Database

With the rise of LLMs, vector search and vector databases have become essential for extending model memory, and this article explains the principles, algorithms, design choices, implementation details, and performance results of ByteHouse’s cloud‑native vector retrieval engine.

ANNByteHouseLLM integration

0 likes · 14 min read

How Vector Search Powers LLMs: Inside ByteHouse’s High‑Performance Vector Database

Meituan Technology Team

Apr 11, 2024 · Artificial Intelligence

GPU-Accelerated Mixed Vector-Scalar Retrieval System for Meituan Takeaway Search

Meituan Waimai’s search team built a GPU‑accelerated, mixed vector‑and‑scalar retrieval engine that supports billions of items, achieving over 99% recall and up to 89% latency reduction by combining pre‑filtering, optimized data layouts, multi‑GPU parallelism, and FP16 precision.

ANNFAISSGPU

0 likes · 20 min read

GPU-Accelerated Mixed Vector-Scalar Retrieval System for Meituan Takeaway Search

Rare Earth Juejin Tech Community

Jan 12, 2024 · Artificial Intelligence

Understanding Vector Databases, ANN Algorithms, and Their Integration with Large Language Models

This article explains the fundamentals of vector databases, how high‑dimensional vector data is generated and stored, reviews common ANN search algorithms such as Flat, k‑means and LSH, discusses benchmarking and product selection, and demonstrates practical integration of vector stores with LLMs using LangChain and Python code.

ANNLLM integrationPython

0 likes · 17 min read

Understanding Vector Databases, ANN Algorithms, and Their Integration with Large Language Models

Sohu Tech Products

Nov 1, 2023 · Databases

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

Douyin tackled vector‑retrieval challenges by optimizing HNSW and creating a high‑performance IVF algorithm, implementing custom scalar quantization, SIMD acceleration, and a DSL‑driven engine that merges filtering with search, then built a cloud‑native, storage‑compute‑separated vector database (VikingDB) delivering sub‑10 ms latency, real‑time updates, multi‑tenant support, and secure, scalable retrieval for LLM‑driven applications.

ANNLLM integrationStorage Compute Separation

0 likes · 18 min read

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

Baidu Geek Talk

Oct 31, 2023 · Artificial Intelligence

Interview on Baidu's Open‑Source Large‑Scale Vector Search Engine Puck

Baidu has open‑sourced its high‑performance, trillion‑scale vector search engine Puck—originally built for ultra‑large image‑search workloads, winner of multiple BIGANN categories, now supporting diverse embeddings alongside the medium‑size Tinker algorithm—to accelerate community innovation, improve code quality, and broaden AI retrieval applications across search, recommendation and cloud services.

AIANNBaidu

0 likes · 12 min read

Interview on Baidu's Open‑Source Large‑Scale Vector Search Engine Puck

DataFunTalk

Oct 30, 2023 · Databases

Engineering Practices and Evolution of Douyin’s Cloud‑Native Vector Database

This article outlines Douyin’s step‑by‑step engineering evolution of its cloud‑native vector database, covering the background of vector search, core concepts, algorithmic optimizations, storage‑compute separation, streaming updates, multi‑tenant orchestration, and future applications such as large language model integration.

ANNCloud NativeDouyin

0 likes · 17 min read

Engineering Practices and Evolution of Douyin’s Cloud‑Native Vector Database

Alibaba Cloud Developer

Sep 21, 2023 · Artificial Intelligence

How Vector Search Powers AI: From Embeddings to Real‑World Applications

This article explains how vector search converts unstructured data such as speech, images, video, and text into high‑dimensional embeddings, explores common algorithms like Brute‑Force, ANN, and HNSW, and presents optimization techniques that dramatically improve recall and query‑per‑second performance for large‑scale AI retrieval systems.

AIANNEmbedding

0 likes · 27 min read

How Vector Search Powers AI: From Embeddings to Real‑World Applications

Baidu Geek Talk

Sep 4, 2023 · Artificial Intelligence

Puck: Baidu’s Open‑Source High‑Performance ANN Retrieval Engine

Puck, Baidu’s open‑source Approximate Nearest Neighbor engine built on the proprietary Puck and Tinker algorithms, delivers high recall, accuracy and throughput across tiny to trillion‑scale datasets, outperforms rivals in benchmarks—including first‑place BIGANN 2021—while offering a simple, extensible API, proven reliability in dozens of Baidu services, and an Apache 2.0 license encouraging community contributions.

ANNBaidubenchmark

0 likes · 7 min read

Puck: Baidu’s Open‑Source High‑Performance ANN Retrieval Engine

dbaplus Community

Aug 26, 2023 · Databases

What Is a Vector Database? A Simple Guide from Kids to Engineers

This article demystifies vector databases by first explaining the concept with a five‑year‑old analogy, then expanding to technical details for developers, covering how embeddings work, the differences from relational databases, ANN search, indexing, similarity metrics, and why vector stores outperform raw NumPy arrays for large‑scale similarity retrieval.

ANNDatabasesmachine learning

0 likes · 9 min read

What Is a Vector Database? A Simple Guide from Kids to Engineers

21CTO

May 16, 2023 · Databases

How Cassandra’s New Vector Search Transforms AI Applications

This article explains how Cassandra’s newly added vector data type and ANN search capabilities empower AI developers to store, index, and query high‑dimensional embeddings at scale, enabling use cases such as image retrieval, recommendation, and large‑language‑model integration.

AIANNCassandra

0 likes · 10 min read

How Cassandra’s New Vector Search Transforms AI Applications

Zhuanzhuan Tech

Sep 21, 2022 · Artificial Intelligence

Vector Retrieval and Product Quantization with Faiss

This article explains the challenges of large‑scale vector retrieval, compares Faiss index types such as brute‑force, graph‑based and product quantization, and details how product quantization works, its memory‑speed trade‑offs, hierarchical quantization, and practical hyper‑parameter tuning.

ANNEmbeddingFAISS

0 likes · 9 min read

Vector Retrieval and Product Quantization with Faiss

Baidu Geek Talk

Feb 14, 2022 · Artificial Intelligence

How Baidu’s PUCK Dominated the First BigANN Vector Search Competition

The inaugural BigANN competition, organized by NeurIPS, showcased large‑scale ANN research, and Baidu's self‑developed PUCK algorithm secured top scores across all four tracks by leveraging multi‑layer quantization, two‑level inverted indexing, and extensive system‑level optimizations.

ANNBigANNPUCK

0 likes · 8 min read

How Baidu’s PUCK Dominated the First BigANN Vector Search Competition

Kuaishou Tech

Dec 10, 2021 · Artificial Intelligence

Kuaishou and Tsinghua University Win NeurIPS'21 Billion-Scale ANN Challenge with FAISS‑Optimized KST_ANN Solution

On December 6, Kuaishou and Tsinghua University’s joint team secured first place in the NeurIPS'21 Billion‑Scale Approximate Nearest Neighbor Search Challenge by leveraging a FAISS‑optimized, memory‑efficient KST_ANN algorithm that achieved over 6% higher recall on multiple billion‑scale datasets, showcasing the practical impact of large‑scale vector retrieval in AI‑driven services.

AIANNFAISS

0 likes · 5 min read

Kuaishou and Tsinghua University Win NeurIPS'21 Billion-Scale ANN Challenge with FAISS‑Optimized KST_ANN Solution

Kuaishou Tech

Nov 29, 2021 · Artificial Intelligence

Starry Vector Retrieval Platform: Architecture, Features, and Performance

The article describes the design, challenges, architecture, key features, algorithm optimizations, and future roadmap of Kuaishou's Starry vector retrieval platform, which delivers high‑performance, high‑reliability, and easy‑to‑use large‑scale ANN search for diverse business scenarios.

AI platformANNPerformance Optimization

0 likes · 14 min read

Starry Vector Retrieval Platform: Architecture, Features, and Performance

Baidu Geek Talk

May 10, 2021 · Industry Insights

How Baidu’s GNOIMI Powers Billion‑Scale Rich Media Retrieval

Baidu’s rich‑media retrieval system combines CNN‑based feature extraction with an Approximate Nearest Neighbor engine called GNOIMI, employing hierarchical clustering, product quantization, and optimized indexing to achieve sub‑millisecond search over billions of images, videos and audio, supporting anti‑spam, recommendation and risk‑control across dozens of services.

ANNGNOIMIHNSW

0 likes · 16 min read

How Baidu’s GNOIMI Powers Billion‑Scale Rich Media Retrieval

ITPUB

Jul 25, 2020 · Backend Development

How SimSvr Achieves Billion‑Scale Real‑Time ANN Search for Recommendations

SimSvr is a high‑performance, distributed feature‑retrieval component designed for recommendation systems that supports billion‑scale indexes, sub‑millisecond query latency, real‑time and batch updates, multi‑model AB‑testing, and advanced filtering, all while running on Tencent's production workloads.

ANNRecommendation Systemsfeature retrieval

0 likes · 17 min read

How SimSvr Achieves Billion‑Scale Real‑Time ANN Search for Recommendations

DataFunTalk

Apr 6, 2020 · Artificial Intelligence

Introducing DeepMatch: An Open‑Source Library for Deep Retrieval Matching Algorithms

DeepMatch is an open‑source Python library that implements several mainstream deep‑learning based recall‑matching algorithms, provides easy installation via pip, detailed usage examples with code, and supports exporting user and item vectors for ANN search, making it ideal for rapid experimentation and learning in recommendation systems.

ANNPythonRecommendation Systems

0 likes · 10 min read

Introducing DeepMatch: An Open‑Source Library for Deep Retrieval Matching Algorithms