Big Data 10 min read

Why Cross-Index Queries Matter in Elasticsearch and How to Implement Them

This article explains why Elasticsearch cross-index queries are essential, outlines their technical principles, showcases classic use cases such as business analytics, big‑data pipelines and log management, and provides practical methods, code examples, and performance considerations for effective implementation.

dbaplus Community

May 24, 2020

Why Cross-Index Queries Matter in Elasticsearch and How to Implement Them

Introduction

Elasticsearch supports flexible sharding within a single index and powerful cross‑index queries that span multiple indices, leveraging the Elastic cluster architecture.

Technical limits of Elasticsearch indices

Official recommendation: a primary shard should not exceed 50 GB; in practice 20‑40 GB is optimal.

Maximum documents per shard ≈2.1 billion (2^31), rarely reached.

Too many shards increase resource consumption and degrade query latency.

Shard count should be estimated before index creation based on data volume and query patterns.

Advantages over traditional sharding

Relational databases require manual sharding; many NoSQL systems provide only one‑dimensional partitioning. Elasticsearch offers two dimensions: multiple index names and multiple shards per index, enabling flexible query targeting across indices.

Typical application scenarios

Business systems

Time‑based indices (monthly, quarterly, yearly) isolate real‑time updates to the current index while preserving historical data, allowing selective scans without full‑dataset scans.

Big‑data platforms

Batch jobs that recompute results can rewrite entire indices; using time‑based indices reduces rebuild cost and balances storage with query performance.

Log management

ELK stacks often create daily or hourly indices for massive log volumes (tens to hundreds of terabytes per day). Queries typically target recent logs, and older indices can be excluded or deleted efficiently.

Ways to perform cross‑index queries

Direct type

Specify multiple index names explicitly. The request fails if any index does not exist.

GET /index_01,index_02/_search
{
  "query": {
    "match": {
      "test": "data"
    }
  }
}

Fuzzy type

Use wildcard patterns (e.g., index_*) to match index names without checking existence. Supports prefix, suffix, or both.

GET /index_*/_search
{
  "query": {
    "match": {
      "test": "data"
    }
  }
}

Computed type

Leverage date math or expressions in index names, such as logstash-{now/d}, to target indices based on the current date.

# Example: index name like index-2024.03.22
GET /logstash-{now/d}/_search
{
  "query": {
    "match": {
      "test": "data"
    }
  }
}

Underlying technical principles

Index sharding

An index is a logical collection composed of one or more primary shards.

Each shard stores the actual documents.

The number of shards per index is set at creation time and cannot be changed without reindexing.

Query process

When a client sends a search request, a coordinating node distributes the query to the relevant shard copies. Each shard processes the query locally and returns results to the coordinating node, which merges them and returns the final response. Cross‑index queries are therefore a set of parallel shard queries merged in the same way as single‑index queries.

Practical considerations

Index‑shard equivalence: One index with 20 shards is functionally similar to four indices each with five shards, but keeping both the number of indices and shards modest improves resource utilization and query latency.

Coordinating node separation: In high‑concurrency environments, deploying dedicated coordinating nodes (separate from data nodes) reduces merge overhead and improves throughput.

Routing mechanism: By default, documents are routed based on the hash of _id. Custom routing keys can limit the number of shards queried, which is useful for cross‑index searches that only need a subset of data.

Conclusion

Cross‑index queries in Elasticsearch are a natural extension of its shard‑based architecture, providing flexible data access for business, big‑data, and log‑management workloads. Advanced features such as cross‑cluster and cross‑version queries are also available.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Elasticsearch Sharding Search Cross-Index Query

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.