Backend Development 15 min read

How Elasticsearch Achieves Lightning‑Fast Search with Inverted Indexes

This article explains how Elasticsearch uses inverted indexes, term dictionaries, and compression techniques like FOR and Roaring Bitmaps to enable rapid full‑text search, contrasting its approach with traditional relational databases and offering practical indexing tips for large‑scale applications.

Open Source Linux

Dec 29, 2021

How Elasticsearch Achieves Lightning‑Fast Search with Inverted Indexes

About Search

Imagine searching a poetry database for verses containing the character "前". A traditional relational database would require a full table scan using a SQL statement such as

select name from poems where content like "%前%";

, which is slow and inefficient for large datasets.

Search engines address this limitation by building an inverted index that maps terms to the documents that contain them.

Search Engine Principles

The indexing pipeline typically includes:

Crawling and stop‑word removal.

Tokenization to extract keywords.

Construction of an inverted index.

User query processing.

Elasticsearch is essentially a wrapper around Lucene, so its inverted‑index implementation follows Lucene's design.

Inverted Index

An inverted index consists of a term dictionary (sorted list of terms), a term index (often a Finite State Transducer, FST, which acts like a compressed trie), and postings lists that store the document IDs for each term.

Each segment in a shard holds up to 2^31 documents, assigning a unique integer ID to every document. The term dictionary is stored on disk in blocks with common‑prefix compression, while the term index resides in memory as an FST, offering O(len(str)) lookup time and minimal memory footprint.

Postings List Tricks

To keep postings lists compact, Elasticsearch employs:

FOR (Frame of Reference) compression: documents are grouped into blocks (typically 256 docs), each block stores the delta‑encoded IDs using the minimal number of bits, dramatically reducing storage size.

Roaring Bitmaps for filter caches: bitmaps allow fast AND/OR operations for filter queries and are stored in a compressed form that adapts to sparsity.

FOR compression example: an ID list [73, 300, 302, 332, 343, 372] becomes delta values [73, 227, 2, 30, 11, 29]; each delta fits in a single byte, saving space.

Roaring Bitmaps split each 32‑bit ID into high and low 16‑bit parts, grouping by high bits and using either an ArrayContainer (for < 4096 entries) or a BitmapContainer (for larger groups) to store the low bits efficiently.

Union Queries

When a filter cache is available, Elasticsearch directly applies bitmap operations to compute intersections and unions. Without a cache, it falls back to a skip‑list algorithm that traverses the compressed postings lists, skipping irrelevant blocks and avoiding unnecessary decompression.

Technical Summary

Elasticsearch speeds up search by using an inverted index (term index → term dictionary → postings list) with FST compression.

FOR compression reduces postings‑list disk usage and improves query latency.

Filter results are cached using Roaring Bitmaps, balancing memory consumption and query speed.

Union queries leverage bitmap operations when possible; otherwise, skip‑list traversal minimizes CPU overhead.

Elasticsearch Indexing Tips

Explicitly disable indexing for fields that do not need it; by default all fields are indexed.

For string fields that do not require analysis, set index": false" and analyzer": "keyword" to avoid unnecessary tokenization.

Avoid highly random document IDs (e.g., UUIDs) because they degrade query performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine Elasticsearch Lucene inverted index compression Postings List

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.