How Elasticsearch Achieves Lightning‑Fast Search with Inverted Indexes and Compression
This article explains how Elasticsearch uses inverted indexes, term dictionaries, and advanced compression techniques like Frame of Reference and Roaring Bitmaps to enable rapid, scalable search over massive datasets, contrasting its approach with traditional relational database queries and detailing practical optimization tips.
Introduction
Recently I have been working on projects that use Elasticsearch (ES) for data storage and search analytics, so I studied ES. This article is based on a technical sharing session.
What Will Be Covered
Differences between traditional relational databases and ES
Search engine principles
Inverted index details
Postings list tricks
Union queries
Search Basics
Example: searching ancient poems containing the character “前”. Using a relational database would require a sequential scan like:
select name from poems where content like "%前%";This approach scans all rows and cannot efficiently return partial matches such as “A”, “AB”, “ABC”. Hence dedicated search engines like ES are needed.
Search Engine Principles
The typical steps are:
Content crawling and stop‑word filtering
Tokenization and keyword extraction
Building an inverted index
User query processing
The inverted index is the core of ES, implemented via Lucene.
Inverted Index
Using the earlier poem example, the inverted index allows direct lookup of documents containing the term “前”. The index consists of a term dictionary, term index, and postings list.
Key Concepts
Term
A keyword in ES is called a term .
Postings List
A postings list is the set of document IDs that contain a given term, e.g. {静夜思, 望庐山瀑布}. IDs are stored as integers for efficient compression.
Term Dictionary and Term Index
All terms are sorted and stored in a term dictionary . The term index (a trie/FST) maps term prefixes to dictionary offsets, enabling fast binary search similar to B‑tree indexes.
Internal Structure of the Index
Lucene compresses the term dictionary using block storage with common‑prefix compression and stores the term index in memory as a Finite State Transducer (FST), which is space‑efficient and provides O(len) lookup.
Postings List Optimizations
Two main techniques are used:
Frame of Reference (FOR) – documents are divided into blocks (e.g., 256 docs per block) and each block stores delta‑encoded IDs with a header indicating the bit width.
Roaring Bitmaps – used for filter caches; they store dense bitmaps efficiently and support fast AND/OR operations.
FOR Compression Example
Given IDs [73, 300, 302, 332, 343, 372], delta encoding yields [73, 227, 2, 30, 11, 29], allowing each value to fit in a single byte.
Roaring Bitmap Compression
Doc IDs are split into high 16‑bit and low 16‑bit parts. High bits are aggregated, and low bits are stored using either an ArrayContainer (for < 4096 entries) or a BitmapContainer (otherwise), reducing space.
Union Queries
If a filter cache exists, ES uses bitmap operations to compute intersections/unions. Without a cache, a skip‑list algorithm traverses postings lists block by block, skipping non‑matching IDs and avoiding decompression of irrelevant blocks.
Summary
ES uses inverted indexes to achieve fast document retrieval, trading higher space usage for query speed.
Lucene’s term dictionary, term index (FST), and postings list form the backbone of the index.
FOR compression dramatically reduces postings list size on disk.
Roaring Bitmaps provide efficient in‑memory filter caching.
Union queries leverage bitmap operations when cached, otherwise use skip‑list traversal to minimize CPU and I/O.
Practical Tips for Using Elasticsearch
Explicitly disable indexing for fields that do not need it; ES indexes fields by default.
For string fields that do not require analysis, set them as not_analyzed.
Avoid highly random IDs (e.g., UUIDs); use monotonic or patterned IDs for better performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
