Databases 8 min read

Deep Dive into Index Implementations of MySQL, InnoDB, MyISAM, and Lucene

This article explains the different index mechanisms used by MySQL (MyISAM and InnoDB) and Lucene, compares them with Elasticsearch's inverted index, and discusses how these structures affect storage, memory usage, and query performance.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Deep Dive into Index Implementations of MySQL, InnoDB, MyISAM, and Lucene

Compared with the familiar B+Tree indexes of MySQL, Elasticsearch uses a compressed in‑memory index that accelerates search but consumes significant memory, prompting deep optimizations.

MyISAM Index Implementation : MyISAM stores indexes in separate .MYI files using a B+Tree where leaf nodes contain record addresses; primary and secondary indexes share the same structure, differing only in uniqueness.

InnoDB Index Implementation : InnoDB also uses B+Tree, but the data file itself is the primary index (a clustered index) with leaf nodes holding full records. Secondary indexes store the primary key value instead of a record address, requiring a two‑step lookup.

Understanding these differences helps in choosing appropriate primary keys and avoiding long or non‑monotonic keys that can degrade InnoDB performance.

Lucene Index Implementation : Lucene employs an inverted index composed of a Term Index, Term Dictionary, and Posting List, rather than a B+Tree. Terms are sorted and stored in a trie‑based Term Index, enabling fast binary search, while the Term Dictionary is block‑compressed and the Posting List holds document IDs for each term.

Comparisons reveal that Lucene's term‑based structures can be faster and more memory‑efficient than MySQL's B+Tree, especially because Lucene caches the Term Index as a finite‑state transducer and uses compression techniques for the dictionary.

Overall, the article highlights how different storage engines and search libraries implement indexing, the trade‑offs in memory and disk usage, and practical guidance for optimizing queries and schema design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingdatabaseElasticsearchluceneInnoDBmysql
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.