Why _count and _stats Return Different Document Numbers in Elasticsearch—and How to Fix It
The article explains why Elasticsearch's _count and _stats APIs can return vastly different document totals, especially when nested fields are involved, and provides step‑by‑step analysis, code examples, and practical solutions such as index refresh and data‑model adjustments.
1. Problem Introduction
When querying an Elasticsearch index with the _count and _stats APIs, the returned document numbers can differ dramatically. For example, GET /achieve_base/_count returns 11163 while GET /achieve_base/_stats returns 300276 , a discrepancy that becomes especially noticeable with nested fields.
2. API Differences
_count API – According to the official Elasticsearch documentation, this API counts the number of documents that match a query. It counts only the top‑level documents and does not differentiate based on document type or nested structures. GET /<target>/_count _stats API – Provides index‑level statistics, including storage size and document count. The document count is at the Lucene level, meaning it includes every original document **and** every Lucene document generated by nested fields. The level parameter can be set to primaries (primary shards only) or total (primary + replica shards).
GET /<target>/_stats3. Impact of nested Fields
Each element in a nested array is stored as an independent Lucene document. Consequently, a single source document that contains multiple nested elements will cause the _stats API to count additional Lucene documents.
Example:
DELETE test_nested_index
PUT /test_nested_index
{
"mappings": {
"properties": {
"nested_field": {
"type": "nested",
"properties": {"name": {"type": "keyword"}}
}
}
}
}
POST /test_nested_index/_doc/1
{
"nested_field": [
{"name": "nested_doc1"},
{"name": "nested_doc2"},
{"name": "nested_doc3"}
]
}Running the APIs yields: GET /test_nested_index/_count → 1 GET /test_nested_index/_stats → 4
After bulk‑inserting additional documents, the _stats result becomes 56 , calculated as 27 × 2 + 2 = 56, confirming that each nested element adds a Lucene document.
4. Solutions
1. Refresh the Index
If data has not been flushed to disk or the Lucene index, the two APIs may diverge. Execute a refresh to make statistics up‑to‑date: POST /achieve_base/_refresh Then re‑run _count and _stats to verify consistency.
2. Re‑understand the Effect of nested
Remember that _count reports the number of original documents, while _stats reports the total number of Lucene documents, including those generated by nested fields.
3. Optimize the Data Model
If nested fields cause an explosion in document count, consider alternatives such as the flattened type or storing nested data as separate top‑level documents.
5. Conclusion
Elasticsearch’s _count and _stats APIs use different counting granularities. _count returns the number of original documents, whereas _stats returns the Lucene‑level document count, which includes nested documents. Understanding this distinction helps avoid confusion and use the APIs correctly, especially when dealing with complex data structures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
