Big Data 14 min read

Understanding Elasticsearch Inverted Index and Efficient Search Retrieval

This article explains how Elasticsearch uses inverted indexes, term dictionaries, and postings lists—along with compression techniques like Frame of Reference and Roaring Bitmaps—to achieve fast, memory‑efficient search queries, and provides practical tips for optimizing indexing and query performance.

IT Architects Alliance

Oct 6, 2021

Understanding Elasticsearch Inverted Index and Efficient Search Retrieval

Recent projects have employed Elasticsearch (ES) for data storage and search analysis, prompting a technical deep‑dive into how ES achieves rapid retrieval.

Using a poetry‑search example, the article contrasts traditional relational database scans with ES's inverted index approach, where a query for the character "前" can directly locate matching documents.

The core components of ES's search engine are outlined: content crawling and stop‑word filtering, tokenization, building an inverted index, and user query processing.

Key concepts such as term (keyword), postings list (document ID list for a term), term dictionary , and term index (a trie‑based structure) are described, emphasizing Lucene's role as the underlying library.

To handle massive term collections efficiently, ES employs a term dictionary with binary search and stores the term index in memory using Finite State Transducers (FST), achieving low space usage and O(len(str)) lookup time.

For postings list compression, ES uses the Frame of Reference (FOR) algorithm, which delta‑encodes sorted integer IDs within 256‑document blocks and stores the required bit‑width as a header, dramatically reducing disk space.

Filter queries benefit from Roaring Bitmaps , a compressed bitmap format that enables fast AND/OR operations while minimizing memory consumption.

When combining multiple postings lists, ES prefers using filter cache bitmaps; otherwise, it falls back to a skip‑list algorithm that skips over non‑matching blocks, saving CPU cycles during decompression.

Practical recommendations include explicitly disabling indexing on unnecessary fields, defining non‑analyzed string fields, and choosing predictable IDs to improve query performance.

Overall, the article demonstrates how ES leverages inverted index structures, advanced compression, and caching to deliver high‑performance search in big‑data scenarios.

Example SQL query illustrating a traditional approach:

select name from poems where content like "%前%";

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine Elasticsearch Lucene inverted index compression Postings List

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.