Fundamentals 16 min read

Understanding Elasticsearch Inverted Index: Posting Lists, Term Dictionary, and Compression Techniques

This article explains how Elasticsearch achieves fast search by using inverted indexes, detailing the structure of posting lists, term dictionaries, term indexes, and compression methods such as Frame of Reference and Roaring Bitmaps, as well as techniques for efficient union and intersection queries.

Selected Java Interview Questions

Jul 1, 2021

Understanding Elasticsearch Inverted Index: Posting Lists, Term Dictionary, and Compression Techniques

Recent projects often use Elasticsearch (ES) for storing and searching data. This article focuses on how ES can retrieve results quickly, rather than covering its distributed architecture or API usage.

1. Introduction

We start by comparing a simple search scenario—finding poems containing the character "前"—using a traditional relational database versus ES.

select name from poems where content like "%前%";

In a relational database this requires a sequential scan of all rows, which is inefficient and does not support prefix matching (e.g., "A", "AB", "ABC"). Search engines like ES address these limitations.

Search Engine Principles

The search process can be summarized in four steps: crawling and stop‑word filtering, tokenization to extract keywords, building an inverted index, and processing user queries.

2. Inverted Index

The core of ES’s fast retrieval is the inverted index, which consists of three main components: postings list, term dictionary, and term index.

2.1 Key Concepts

Term

In ES a keyword is called a term .

Postings List

A postings list contains the IDs of all documents that contain a given term. For example, the term "前" might have a postings list represented as {静夜思, 望庐山瀑布}. Document IDs range from 0 to (2^31)-1, with each segment holding up to 2^31 documents.

2.2 Internal Structure

The term dictionary stores all terms in sorted order, enabling binary search similar to a B+‑tree index in MySQL.

The term index is implemented as a trie (dictionary tree) and stored in memory as a Finite State Transducer (FST), allowing fast location of a term’s dictionary offset.

2.3 Compression Techniques

Frame of Reference (FOR)

Postings are divided into blocks of 256 documents. Within each block, IDs are delta‑encoded (each ID stored as the difference from the previous one) and the minimal bit‑width needed for the block is stored as a header. Example:

Original ID list: [73, 300, 302, 332, 343, 372] Delta‑encoded list: [73, 227, 2, 30, 11, 29]. Since all values are < 255, each can be stored in a single byte.

Roaring Bitmaps (for filter cache)

For filter queries ES uses Roaring Bitmaps. Document IDs are split into high‑16‑bit and low‑16‑bit parts. High bits act as keys; low bits are stored in containers chosen based on cardinality:

ArrayContainer for len < 4096 BitmapContainer for larger lists

Threshold example: 2^16 = 65536 bits = 8 KB; storing 4 K values as 2‑byte integers also consumes 8 KB, so the container choice balances space.

2.4 Union / Intersection Queries

If a filter cache exists, ES directly uses bitmap AND/OR operations for fast set intersections/unions.

When no cache is available, ES traverses compressed postings lists using a skip‑list, which can jump over entire blocks, avoiding unnecessary decompression and reducing CPU cost.

3. Summary

Elasticsearch uses inverted indexes to locate target documents quickly, sacrificing memory for significant search speed gains.

Lucene stores the structure as term → term dictionary → postings list , compressing the term dictionary with FST.

FOR provides block‑level delta compression for postings, dramatically reducing disk usage.

Roaring Bitmaps cache filter results efficiently, balancing memory consumption and query speed.

Union queries leverage bitmap operations when possible; otherwise, skip‑list traversal minimizes decompression overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine Elasticsearch inverted index compression Term Dictionary Postings List

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.