Databases 5 min read

Elasticsearch Index Design and Sharding Principles

This article outlines practical guidelines for designing Elasticsearch indices, comparing single versus time‑based indexes, detailing mapping settings, shard allocation strategies, and deduplication methods, while providing concrete examples and code snippets for effective search infrastructure management.

Architecture Digest

Jul 9, 2018

Elasticsearch Index Design and Sharding Principles

1. Elasticsearch Index Design

Discusses whether to use a single index or time‑based indices, highlighting limitations of a single index such as inability to update mappings, limited scalability, and suitability only for small, static datasets.

1.1 Single vs Time‑Based Index

Time‑based indices require decisions on interval (data volume, change frequency) and implementation via index templates.

1.2 Index Definition Considerations

Provides a concrete index template example with settings, aliases, and strict mappings, and stresses not to define multiple types in one index (unsupported from ES 6.x onward).

Additional best‑practice notes include disabling _source when raw documents are unnecessary, turning off _all for known query fields, using dynamic: "strict" to prevent dirty data, choosing keyword type for exact matches, and employing index aliases for zero‑downtime switches.

2. Elasticsearch Shard Allocation Principles

Answers common community questions about the number of indices, shards, shard size, and replica count, and presents a six‑step practical workflow: define the index, evaluate data volume, estimate index size and disk usage, calculate shard count (keeping each shard <30 GB and scaling with node count), assess the number of indices and types, and iterate as needed.

3. Data Deduplication Strategies

Compares three approaches: using a unique ID (which can increase storage and cause high cardinality), using aggregation to find duplicate hash values, and using distinct queries; includes example aggregation DSL wrapped in GET *_index/_search { ... } and notes on performance and storage trade‑offs.

4. Summary

The notes capture the core insights from an Elasticsearch sharing session held on 2018‑06‑30, offering actionable steps and configuration snippets that can be directly applied in production environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch index design data modeling

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.