Elasticsearch vs MySQL: How Inverted Indexes Enable Faster Complex Queries
This article explains why Elasticsearch handles complex conditional queries more efficiently than MySQL by using inverted indexes, term dictionaries, skip‑list and roaring bitmap structures, while also discussing the trade‑offs such as slower write performance.
Why Elasticsearch Handles Complex Queries
MySQL can use at most one index to filter rows; remaining predicates are evaluated in memory, causing high I/O and CPU usage. Elasticsearch, built on Lucene, stores a separate inverted index for each field, allowing all predicates to be evaluated via index look‑ups. This makes Elasticsearch the de‑facto solution for order, log, and other multi‑condition search scenarios.
Core Concepts Compared with MySQL
Index (Elasticsearch) ≈ Database (MySQL)
Type (removed in ES 7.x) ≈ Table
Document ≈ Row ; a document consists of Fields (columns)
Mapping defines field types, analogous to a relational Schema
Elasticsearch uses its own Query DSL instead of SQL.
Inverted Index and Term Structures
For each searchable field Elasticsearch builds an inverted index that maps terms (e.g., an ISBN or an author name) to a posting list of document IDs containing that term.
Terms are stored in a Term Dictionary sorted alphabetically. Because the dictionary can be large, Elasticsearch creates a Term Index using a Burst‑Trie (a compressed prefix tree). The term index holds only term prefixes, enabling fast navigation to the relevant region of the term dictionary and reducing disk I/O.
When a query requests a specific term, Elasticsearch performs a binary search on the sorted term dictionary (or uses the term index to locate the correct block) and then reads the associated posting list from disk.
Skip‑List Intersection for Multi‑Condition Queries
Posting lists are stored with a multi‑level skip list . To compute the intersection of two conditions (e.g., score = 2.2 AND author = "Tom"), Elasticsearch:
Selects the shorter posting list.
Iterates its document IDs.
Uses the skip list to jump forward in the longer list until it reaches an ID ≥ the current one.
Example posting lists:
Score: [2,3,4,5,7,9,10,11]
Author: [3,8,9,12,13]
Using the skip‑list algorithm, the intersection yields only [3]. The skip list reduces the number of comparisons from O(N × M) to O(N + M) in practice.
Roaring Bitmap Caching (Bitset Strategy)
Elasticsearch also caches posting lists in memory using Roaring Bitmaps , a compressed bitmap format optimized for sparse data sets.
Key properties:
The 32‑bit integer space is divided into 2^16 (= 65 536) containers based on the high 16 bits.
If a container holds ≤ 4 096 entries, it stores them as an ordered unsigned short array (≈ 8 KB).
If a container holds > 4 096 entries, it stores a full 2^16‑bit bitset (fixed 8 KB) which may be further compressed with run‑length encoding (RLE).
This design avoids the 512 MB memory cost of a plain bitset for a full 2^32 range while still supporting fast logical AND operations on posting lists.
Performance Trade‑offs
The rich indexing structures (inverted index, term dictionary, skip lists, Roaring bitmap cache) give Elasticsearch sub‑millisecond response times for complex Boolean queries. However, they introduce overhead during data ingestion:
Indexing is slower than MySQL because each field must be tokenized, written to the term dictionary, and stored in posting lists.
Newly indexed documents become searchable only after a refresh cycle (default 1 s), so Elasticsearch provides eventual consistency rather than immediate visibility.
Conclusion
Elasticsearch’s architecture—field‑level inverted indexes, burst‑trie term indexes, skip‑list intersection, and Roaring bitmap caching—makes it far more suitable than MySQL for queries that involve multiple conditions on large data sets. The trade‑off is higher write latency and a short delay before newly indexed data is searchable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
