Master Elasticsearch Index Design: From Mapping to Sharding Best Practices
This article provides a comprehensive guide to Elasticsearch index architecture, covering fundamental concepts, index mapping, field types, alias usage, shard and replica strategies, shard planning, resource impact, and practical recommendations for optimizing performance and stability in production environments.
Background
As Elasticsearch adoption grows, clusters face increasing pressure for stability, manageability, and operational efficiency. Inconsistent index configurations and copy‑pasted scripts often lead to sub‑optimal performance and stability problems.
Typical index definition
An index consists of aliases, mappings, and settings. The example below creates an index with a multi‑field mapping and common settings:
PUT /index_demo
{
"aliases": {
"index_demo_alias": {}
},
"mappings": {
"properties": {
"id": { "type": "long" },
"name": {
"type": "text",
"fields": {
"keyword": { "type": "keyword", "ignore_above": 256 }
}
},
"status": { "type": "keyword" },
"createDate": { "type": "long" }
}
},
"settings": {
"index": {
"refresh_interval": "5s",
"number_of_shards": 3,
"number_of_replicas": 1
}
}
}The ignore_above parameter (default 256 characters) prevents excessively long keyword values from being indexed, reducing storage size and improving query performance.
Alias
An alias can point to one or more indices or data streams, enabling a single logical name for queries, zero‑downtime reindexing, and runtime index switching.
PUT /test_index
{
"settings": { "number_of_shards": 1, "number_of_replicas": 1 },
"aliases": { "test_alias": {} },
"mappings": {
"properties": {
"field1": { "type": "text" },
"createdAt": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }
}
}
} POST /_aliases
{
"actions": [
{ "add": { "index": "test_index", "alias": "test_alias" } }
]
} POST /_aliases
{
"actions": [
{ "add": { "index": "existing_index", "alias": "test_alias" } },
{ "remove": { "index": "old_index", "alias": "old_test_alias" } }
]
}Adding an alias at creation time or later is recommended because dynamic shard‑expansion mechanisms rely on a stable alias.
Mapping
Mapping defines the document structure. Field types are immutable after creation; changing a type requires reindexing. Elasticsearch also supports dynamic mapping, which infers types from incoming documents.
Field types and recommendations
text : Analyzed for full‑text search. Not suitable for sorting or aggregations. Use a keyword sub‑field when exact matches or aggregations are needed.
keyword : Not analyzed, stored as a whole. Ideal for exact match queries, sorting, and aggregations. Supports ignore_above and case_insensitive in term queries.
numeric (e.g., long, integer, float): Choose the smallest type that satisfies precision requirements. For high‑precision decimal values, consider scaled_float.
General guidance:
Prefer keyword for fields that do not require full‑text analysis.
If the query pattern is uncertain, define both text and keyword via multi‑fields.
Avoid using text fields in aggregations; they consume large amounts of heap.
Select language‑appropriate analyzers (e.g., ik_smart for Chinese) instead of the default analyzer.
Shard structure – primary and replica
Each index is split into primary shards (the original data) and replica shards (full copies). Primary shard count is fixed at index creation; changing it later requires reindexing.
Shard planning
Choosing the right number of shards balances memory, CPU, and I/O usage. Recommended practices:
Estimate total data size and divide by a target shard size (10‑50 GB for most workloads; up to 100 GB for log data).
Read‑heavy workloads benefit from larger shards (20‑40 GB) and more replicas.
Write‑heavy workloads benefit from smaller shards (10‑20 GB) to improve indexing throughput.
Time‑based indices (daily/weekly) often use ILM policies to roll over shards.
Small‑volume indices typically need 1‑2 primary shards.
Set index.routing.allocation.total_shards_per_node to a multiple of the number of data nodes to distribute load evenly:
PUT /my_index/_settings
{
"index.routing.allocation.total_shards_per_node": 2
}Resource impact of index design
Each shard consumes roughly 10‑30 MB of heap for metadata and additional file descriptors. Excessive shards increase heap pressure, I/O, and the risk of “too many open files” errors. Within a shard, many Lucene segments can cause fragmentation, leading to extra I/O, higher GC overhead, and slower queries.
Conclusion
Effective Elasticsearch index design requires careful selection of field types, thoughtful mapping, consistent alias usage, and appropriate shard and replica counts. Following the guidelines above helps maintain cluster stability, reduces resource consumption, and ensures predictable performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
