Databases 7 min read

Introduction to Elasticsearch: Architecture, Core Concepts, and Common Operations

This article provides a comprehensive overview of Elasticsearch, covering its distributed architecture, fundamental concepts such as nodes, shards, and indices, as well as practical guidance on index design, bulk writing, query‑fetch workflow, scroll queries, aliases, and basic optimization tips.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Introduction to Elasticsearch: Architecture, Core Concepts, and Common Operations

Elasticsearch is an open‑source search engine built on Apache Lucene that offers a distributed, near‑real‑time architecture with a standard RESTful API, capable of both single‑node and cluster deployments.

The system consists of several key components: a Node is an individual Elasticsearch instance; a Master node supervises and controls other nodes; Data nodes store and index data; any node can act as a coordinating node to gather results; an Index stores documents similar to a database; an Index is divided into primary shards and replica shards, with sharding enabling horizontal scaling.

Basic concepts also include Types (deprecated after version 7.x), Documents (JSON objects composed of fields), and cluster health states (Green, Yellow, Red) indicating shard availability.

Index design involves Settings (defining the number of primary shards and replicas) and Mappings (defining field types). Dynamic mapping is discouraged due to performance, storage, and relevance issues.

Templates allow automatic application of settings and mappings to indices that match a naming pattern, such as Logstash indices.

For write operations, Elasticsearch provides a bulk API that batches multiple indexing actions into a single request, dramatically improving throughput; optimal bulk size typically ranges from 5 MB to 15 MB, depending on hardware and document characteristics.

Search execution consists of a query phase, where each shard processes the request and returns sorted doc IDs, followed by a fetch phase, where the coordinating node retrieves the full documents from the relevant shards.

Scroll (cursor) queries enable efficient retrieval of large result sets without deep pagination, returning a _scroll_id that must be supplied in subsequent requests.

Index aliases act as lightweight pointers to one or more indices, facilitating seamless index re‑creation and zero‑downtime migrations.

Basic DSL optimization recommendations include using appropriate query clauses (match, term, bool), leveraging filters, avoiding unnecessary relevance scoring, and selecting suitable field types such as keyword.

The article concludes that while it covers essential Elasticsearch concepts and common practices, deeper topics such as internal mechanics, advanced performance tuning, and newer version features remain open for further study.

IndexingSearch EngineElasticsearchmappingDistributedQuerybulk
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.