Databases 21 min read

Master Elasticsearch Index Design: From Mapping to Sharding Best Practices

This article provides a comprehensive guide to Elasticsearch index architecture, covering fundamental concepts, index mapping, field types, alias usage, shard and replica strategies, shard planning, resource impact, and practical recommendations for optimizing performance and stability in production environments.

dbaplus Community

May 13, 2025

Master Elasticsearch Index Design: From Mapping to Sharding Best Practices

Background

As Elasticsearch adoption grows, clusters face increasing pressure for stability, manageability, and operational efficiency. Inconsistent index configurations and copy‑pasted scripts often lead to sub‑optimal performance and stability problems.

Typical index definition

An index consists of aliases, mappings, and settings. The example below creates an index with a multi‑field mapping and common settings:

PUT /index_demo
{
  "aliases": {
    "index_demo_alias": {}
  },
  "mappings": {
    "properties": {
      "id": { "type": "long" },
      "name": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      },
      "status": { "type": "keyword" },
      "createDate": { "type": "long" }
    }
  },
  "settings": {
    "index": {
      "refresh_interval": "5s",
      "number_of_shards": 3,
      "number_of_replicas": 1
    }
  }
}

The ignore_above parameter (default 256 characters) prevents excessively long keyword values from being indexed, reducing storage size and improving query performance.

Alias

An alias can point to one or more indices or data streams, enabling a single logical name for queries, zero‑downtime reindexing, and runtime index switching.

PUT /test_index
{
  "settings": { "number_of_shards": 1, "number_of_replicas": 1 },
  "aliases": { "test_alias": {} },
  "mappings": {
    "properties": {
      "field1": { "type": "text" },
      "createdAt": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }
    }
  }
}

POST /_aliases
{
  "actions": [
    { "add": { "index": "test_index", "alias": "test_alias" } }
  ]
}

POST /_aliases
{
  "actions": [
    { "add": { "index": "existing_index", "alias": "test_alias" } },
    { "remove": { "index": "old_index", "alias": "old_test_alias" } }
  ]
}

Adding an alias at creation time or later is recommended because dynamic shard‑expansion mechanisms rely on a stable alias.

Mapping

Mapping defines the document structure. Field types are immutable after creation; changing a type requires reindexing. Elasticsearch also supports dynamic mapping, which infers types from incoming documents.

Field types and recommendations

text : Analyzed for full‑text search. Not suitable for sorting or aggregations. Use a keyword sub‑field when exact matches or aggregations are needed.

keyword : Not analyzed, stored as a whole. Ideal for exact match queries, sorting, and aggregations. Supports ignore_above and case_insensitive in term queries.

numeric (e.g., long, integer, float): Choose the smallest type that satisfies precision requirements. For high‑precision decimal values, consider scaled_float.

General guidance:

Prefer keyword for fields that do not require full‑text analysis.

If the query pattern is uncertain, define both text and keyword via multi‑fields.

Avoid using text fields in aggregations; they consume large amounts of heap.

Select language‑appropriate analyzers (e.g., ik_smart for Chinese) instead of the default analyzer.

Shard structure – primary and replica

Each index is split into primary shards (the original data) and replica shards (full copies). Primary shard count is fixed at index creation; changing it later requires reindexing.

Shard planning

Choosing the right number of shards balances memory, CPU, and I/O usage. Recommended practices:

Estimate total data size and divide by a target shard size (10‑50 GB for most workloads; up to 100 GB for log data).

Read‑heavy workloads benefit from larger shards (20‑40 GB) and more replicas.

Write‑heavy workloads benefit from smaller shards (10‑20 GB) to improve indexing throughput.

Time‑based indices (daily/weekly) often use ILM policies to roll over shards.

Small‑volume indices typically need 1‑2 primary shards.

Set index.routing.allocation.total_shards_per_node to a multiple of the number of data nodes to distribute load evenly:

PUT /my_index/_settings
{
  "index.routing.allocation.total_shards_per_node": 2
}

Resource impact of index design

Each shard consumes roughly 10‑30 MB of heap for metadata and additional file descriptors. Excessive shards increase heap pressure, I/O, and the risk of “too many open files” errors. Within a shard, many Lucene segments can cause fragmentation, leading to extra I/O, higher GC overhead, and slower queries.

Conclusion

Effective Elasticsearch index design requires careful selection of field types, thoughtful mapping, consistent alias usage, and appropriate shard and replica counts. Following the guidelines above helps maintain cluster stability, reduces resource consumption, and ensures predictable performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Performance Optimization Elasticsearch index design Mapping sharding

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.