Understanding Elasticsearch Indexes, Mappings, and Shard Architecture
The article explains Elasticsearch indexes as logical namespaces for JSON documents, details how mappings define immutable field types such as text, keyword, and numeric, describes primary and replica shard roles, and offers practical guidance on alias usage, shard sizing, replica settings, and performance optimizations to ensure cluster stability.
This article introduces the core concepts of Elasticsearch (ES) indexes, their structure, and how they affect cluster stability and performance.
Background : With increasing ES usage, many teams encounter inconsistent index designs, non‑standard creation scripts, and a lack of clear understanding of index structures. The platform therefore provides template‑based index creation, approval workflows, and dynamic shard‑expansion without downtime.
What is an index? An index is a logical namespace that stores a collection of documents (similar to a table in relational databases). Each document is a JSON object and the smallest searchable unit. Indexes enable storage, retrieval, and aggregation of data.
Official definition: "The index is the fundamental unit of storage in Elasticsearch, a logical namespace for storing data that share similar characteristics."
Index structure details :
Aliases : Provide an indirection layer; an alias can point to one or more indices, enabling zero‑downtime reindexing and dynamic shard expansion.
Mappings : Define field types. Once set, field types cannot be changed without reindexing. ES also supports dynamic mapping.
Field types :
Text : Full‑text searchable, analyzed, not suitable for sorting/aggregation. Use multi‑fields to also store a keyword version.
Keyword : Not analyzed, ideal for exact matches, sorting, and aggregations. Recommended for fields that do not require full‑text search.
Numeric : Includes long , integer , float , etc. Choose the smallest appropriate type to save space and improve performance.
Example index creation (kept unchanged):
PUT /index_demo
{
"aliases": {"index_demo_alias": {}},
"mappings": {
"properties": {
"id": {"type": "long"},
"name": {
"type": "text",
"fields": {"keyword": {"type": "keyword", "ignore_above": 256}}
},
"status": {"type": "keyword"},
"createDate": {"type": "long"}
}
},
"settings": {
"index": {
"refresh_interval": "5s",
"number_of_shards": "3",
"number_of_replicas": "1"
}
}
}Key alias operations (add, replace):
POST /_aliases
{
"actions": [
{"add": {"index": "test_index", "alias": "test_alias"}}
]
}Shard architecture:
Primary shard : Stores the actual data; number of primary shards is fixed at index creation.
Replica shard : Copy of a primary shard for high availability and read scaling; replica count can be changed dynamically.
Shard planning recommendations:
Estimate total data volume and set primary shards ≈ total data / 10‑50 GB (or up to 100 GB for logs).
Keep shard count a multiple of the number of data nodes.
Increase replicas for read‑heavy workloads, but be aware of write overhead.
Avoid excessive shards; each shard consumes 10‑30 MB of heap memory and file handles.
Performance tips:
Prefer keyword over text when exact matching is sufficient.
Use multi‑fields to have both text and keyword versions.
Set ignore_above (default 256) to prevent long strings from being indexed.
For numeric fields, consider using keyword if only exact matches are needed.
Disable unnecessary refresh intervals to reduce I/O.
Conclusion : Proper index design—choosing appropriate field types, planning shard and replica numbers, and using aliases—directly impacts ES cluster stability and performance. Teams should treat index creation as a critical part of the technical solution.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.