Big Data 7 min read

Master ElasticSearch: Core Concepts, Architecture, and Search Process Explained

This article provides a comprehensive overview of ElasticSearch, covering its role as a distributed full‑text search engine built on Lucene, key concepts such as index, type, document, field, shard and replica, the analysis pipeline, inverted index mechanics, and the two‑phase query‑fetch search workflow.

Mike Chen's Internet Architecture

Jul 10, 2025

Master ElasticSearch: Core Concepts, Architecture, and Search Process Explained

What is ElasticSearch?

ElasticSearch is a distributed full‑text search engine built on Apache Lucene, widely used in big‑data scenarios.

Core Concepts

Index : a collection of documents with similar characteristics, consisting of mapping and inverted‑index files; data may reside on one or many nodes.

Type : logical grouping of similar documents, analogous to a table in a relational database.

Document : the basic searchable unit, represented as JSON, similar to a row.

Field : the smallest unit inside a document, comparable to a column.

Shard : a slice of an index that enables horizontal scaling; each shard is a physical Lucene index.

Replica : a copy of a primary shard that provides fault tolerance and can serve read requests.

Analysis Process

ElasticSearch uses an analyzer composed of three components:

Character filter : preprocesses raw text (e.g., removes HTML tags).

Tokenizer : splits text into tokens; default behavior separates English by whitespace and Chinese by characters, with optional machine‑learning tokenizers.

Token filter : further processes tokens (e.g., lower‑casing, stop‑word removal).

Built‑in tokenizers include Standard, Simple, Stop, Whitespace, Keyword, Pattern, and language‑specific analyzers.

Inverted Index

The inverted index maps terms to the list of document IDs containing them, enabling fast full‑text search, in contrast to a forward index that maps document IDs to their content.

Search Workflow

Search executes in two phases:

Query phase

The coordinating node broadcasts the request to all relevant primary or replica shards.

Each shard performs the query locally and builds a priority queue of matching documents (size = from + size).

Shards return document IDs and scores; the coordinating node merges, sorts, and paginates the results.

Fetch phase

The coordinating node retrieves the actual document source for the selected IDs from the appropriate shards and returns the final result set to the client.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Sharding inverted index Analysis Full-Text Search

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.