Why Cross-Index Queries Matter in Elasticsearch and How to Implement Them
This article explains why Elasticsearch cross-index queries are essential, outlines their technical principles, showcases classic use cases such as business analytics, big‑data pipelines and log management, and provides practical methods, code examples, and performance considerations for effective implementation.
Introduction
Elasticsearch supports flexible sharding within a single index and powerful cross‑index queries that span multiple indices, leveraging the Elastic cluster architecture.
Technical limits of Elasticsearch indices
Official recommendation: a primary shard should not exceed 50 GB; in practice 20‑40 GB is optimal.
Maximum documents per shard ≈2.1 billion (2^31), rarely reached.
Too many shards increase resource consumption and degrade query latency.
Shard count should be estimated before index creation based on data volume and query patterns.
Advantages over traditional sharding
Relational databases require manual sharding; many NoSQL systems provide only one‑dimensional partitioning. Elasticsearch offers two dimensions: multiple index names and multiple shards per index, enabling flexible query targeting across indices.
Typical application scenarios
Business systems
Time‑based indices (monthly, quarterly, yearly) isolate real‑time updates to the current index while preserving historical data, allowing selective scans without full‑dataset scans.
Big‑data platforms
Batch jobs that recompute results can rewrite entire indices; using time‑based indices reduces rebuild cost and balances storage with query performance.
Log management
ELK stacks often create daily or hourly indices for massive log volumes (tens to hundreds of terabytes per day). Queries typically target recent logs, and older indices can be excluded or deleted efficiently.
Ways to perform cross‑index queries
Direct type
Specify multiple index names explicitly. The request fails if any index does not exist.
GET /index_01,index_02/_search
{
"query": {
"match": {
"test": "data"
}
}
}Fuzzy type
Use wildcard patterns (e.g., index_*) to match index names without checking existence. Supports prefix, suffix, or both.
GET /index_*/_search
{
"query": {
"match": {
"test": "data"
}
}
}Computed type
Leverage date math or expressions in index names, such as logstash-{now/d}, to target indices based on the current date.
# Example: index name like index-2024.03.22
GET /logstash-{now/d}/_search
{
"query": {
"match": {
"test": "data"
}
}
}Underlying technical principles
Index sharding
An index is a logical collection composed of one or more primary shards.
Each shard stores the actual documents.
The number of shards per index is set at creation time and cannot be changed without reindexing.
Query process
When a client sends a search request, a coordinating node distributes the query to the relevant shard copies. Each shard processes the query locally and returns results to the coordinating node, which merges them and returns the final response. Cross‑index queries are therefore a set of parallel shard queries merged in the same way as single‑index queries.
Practical considerations
Index‑shard equivalence: One index with 20 shards is functionally similar to four indices each with five shards, but keeping both the number of indices and shards modest improves resource utilization and query latency.
Coordinating node separation: In high‑concurrency environments, deploying dedicated coordinating nodes (separate from data nodes) reduces merge overhead and improves throughput.
Routing mechanism: By default, documents are routed based on the hash of _id. Custom routing keys can limit the number of shards queried, which is useful for cross‑index searches that only need a subset of data.
Conclusion
Cross‑index queries in Elasticsearch are a natural extension of its shard‑based architecture, providing flexible data access for business, big‑data, and log‑management workloads. Advanced features such as cross‑cluster and cross‑version queries are also available.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
