Backend Development 15 min read

How Elasticsearch Achieves Lightning‑Fast Search with Inverted Indexes and Compression

This article explains how Elasticsearch uses inverted indexes, term dictionaries, postings lists, and compression techniques like Frame‑of‑Reference and Roaring Bitmaps to deliver rapid, memory‑efficient search, and provides practical tips for indexing and query optimization.

Efficient Ops

Sep 21, 2021

How Elasticsearch Achieves Lightning‑Fast Search with Inverted Indexes and Compression

Recently I have been working on projects that use Elasticsearch (ES) for data storage and search analysis, so I studied ES. This article is a technical sharing that focuses on how ES achieves fast retrieval.

About Search

Imagine a scenario where we need to find ancient poems containing the character "前". Using a traditional relational database we would write a SQL query like:

select name from poems where content like "%前%";

This sequential scan scans every record, which is inefficient for search.

Search Engine Principles

Content crawling and stop‑word filtering.

Tokenization to extract keywords.

Building an inverted index from the keywords.

User enters keywords to perform the search.

The core of this process is the inverted index, which ES implements via Lucene.

Inverted Index

After building the inverted index, a query for "前" can directly locate matching poems. The index consists of three main components:

Term : the keyword (called a term in ES).

Postings list : the list of document IDs that contain the term.

Term dictionary and term index : a sorted dictionary (often a B‑tree or Trie) that maps terms to their postings.

Lucene stores the term dictionary on disk in blocks with prefix compression and keeps the term index in memory as a Finite State Transducer (FST) for fast lookup.

Techniques for Postings List

Postings lists need to be compressed and support fast union/intersection queries.

Compression (FOR)

Lucene uses Frame‑of‑Reference (FOR) to delta‑encode ordered integer IDs. IDs are grouped into blocks (e.g., 256 docs per block); each block stores the bit width needed for the deltas as a header, achieving 1‑64 bit storage per ID.

Roaring Bitmaps (Filter Cache)

For filter queries ES caches results using Roaring Bitmaps, which store sparse bitmaps efficiently and allow fast bitwise AND/OR operations.

Union/Intersection

If a filter cache is present, bitmaps are used directly for AND/OR. Otherwise, ES uses a skip‑list algorithm to intersect multiple postings lists, skipping over non‑matching IDs and avoiding decompression of entire blocks.

Summary

ES uses inverted indexes (term → dictionary → postings) to locate documents quickly, trading higher space usage for much faster search.

Lucene’s term index, term dictionary, and FST compression enable fast in‑memory lookups.

FOR compression dramatically reduces postings list size.

Roaring Bitmaps cache filter results, saving memory and speeding up frequent filters.

When a filter cache exists, bitwise operations handle unions/intersections; otherwise, skip‑list traversal reduces CPU cost.

Elasticsearch Indexing Tips

Explicitly disable indexing for fields that do not need it; ES indexes fields by default.

For string fields that do not require analysis, set them to not_analyzed; analysis is enabled by default.

Prefer predictable, low‑entropy IDs; random UUIDs hinder query performance.

Author: Richard_Yi (source: juejin.cn)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine inverted index compression Roaring Bitmap Postings List

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.