Databases 14 min read

Understanding Elasticsearch Fast Retrieval: Inverted Index, Term Dictionary, and Compression Techniques

This article explains how Elasticsearch achieves fast data retrieval by comparing it with traditional relational databases, detailing search engine fundamentals, the structure of Lucene's inverted index—including term dictionaries, postings lists, and term indexes—and the compression techniques such as Frame of Reference and Roaring Bitmaps that optimize storage and query performance.

Architect

Sep 4, 2021

Understanding Elasticsearch Fast Retrieval: Inverted Index, Term Dictionary, and Compression Techniques

The article shares a technical overview of Elasticsearch (ES) focusing on fast retrieval, beginning with a comparison between traditional relational databases and ES for searching poems containing the character "前".

It demonstrates the relational approach with a SQL query:

select name from poems where content like "%前%";

and contrasts it with ES's inverted index mechanism, which allows direct location of matching documents.

Key concepts of the inverted index are explained: term (keyword), postings list (document IDs for a term), term dictionary (sorted list of terms for binary search), and term index (often a trie/FST structure) that maps terms to dictionary offsets.

The article delves into Lucene's implementation details, describing how ES stores data in shards and segments, each segment holding up to 2^31 documents, and how term dictionaries are compressed using block-level prefix sharing.

Compression techniques are covered extensively. Frame of Reference (FOR) uses delta encoding within 256‑document blocks, storing the minimal bit width for each block, while Roaring Bitmaps provide efficient bitmap compression for filter caches, handling both dense and sparse cases with ArrayContainer and BitmapContainer.

For joint queries, the article explains how ES leverages filter caches (bitmap operations) when available, and otherwise employs skip‑list traversal of postings lists to intersect results, reducing both I/O and CPU overhead.

In summary, ES achieves high‑performance search by employing an inverted index structure (term dictionary → term index → postings list), compressing postings with FOR, using FST for term dictionaries, and applying Roaring Bitmaps for fast filter caching, while also offering practical indexing tips such as disabling unnecessary fields, careful ID design, and understanding trade‑offs between storage and query speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine Elasticsearch Lucene inverted index compression Roaring Bitmap Postings List

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.