Databases 14 min read

Understanding Elasticsearch Inverted Index: Fast Retrieval, Compression, and Query Techniques

This article explains how Elasticsearch uses inverted index structures—including term dictionaries, term indexes, and postings lists—combined with compression methods like Frame‑of‑Reference and Roaring Bitmaps to achieve fast search, efficient storage, and effective union queries compared to traditional relational databases.

IT Architects Alliance

Sep 29, 2021

Understanding Elasticsearch Inverted Index: Fast Retrieval, Compression, and Query Techniques

Recent projects have used Elasticsearch (ES) for data storage and search, prompting a deep dive into how ES achieves rapid retrieval without focusing on its distributed architecture or API usage.

The article first contrasts traditional relational database scans with ES's inverted index approach, illustrating a simple SQL example: select name from poems where content like "%前%"; It then outlines the basic steps of a search engine: crawling, stop‑word filtering, tokenization, building an inverted index, and query processing.

The core of ES's search speed lies in its inverted index, which consists of a term dictionary, a term index (implemented as a Finite State Transducer), and postings lists. Terms (keywords) map to document IDs, and these IDs are stored efficiently using compression.

Two main compression techniques are discussed:

Frame‑of‑Reference (FOR) encodes ordered integer doc IDs as deltas within fixed‑size blocks, dramatically reducing storage.

Roaring Bitmaps are used for filter caches, allowing fast bitmap operations while keeping memory usage low.

For union queries, ES first checks for a cached filter bitmap; if unavailable, it employs a skip‑list algorithm to intersect postings lists, skipping unnecessary blocks and avoiding decompression overhead.

Practical ES indexing tips are provided: explicitly disable indexing for unused fields, define non‑analyzed string fields, and prefer predictable IDs over random UUIDs.

In summary, ES leverages Lucene's inverted index—term dictionary → term index → postings list—augmented by FST compression, FOR block compression, and Roaring Bitmap caching to deliver high‑performance search while managing memory and disk usage efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine Elasticsearch Lucene inverted index compression Roaring Bitmap Postings List

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.