Elasticsearch Bulk Writes Succeed but _source Is Empty: 6‑Hour Debugging Story & Pitfall Guide
When Elasticsearch Bulk API reports successful writes and Count shows documents, but Search and Get return empty _source, a six‑hour investigation reveals the root cause is a disabled _source mapping and provides a step‑by‑step debugging checklist and fix.
Phenomenon
Bulk API returns (10, []) – ten documents indexed without errors. Count API reports count: 10. Search API returns hits whose _source fields are empty objects ( {}), and Get API returns _source: null or no _source at all.
Impact
Down‑stream validation that reads the written data fails.
Users repeatedly retry, assuming network, permission or API problems.
Investigation steps taken
1. Bulk format changes
Switched action from { "index": {}, "doc": {...} } to { "_index": "xxx", "_source": {...} } and added _type: "_doc" (7.x).
Adjusted chunk_size, refresh, request_timeout.
Result: Bulk still reported success, Count remained correct, Search still returned empty _source.
2. Refresh timing
Used refresh='wait_for' / refresh=True, called indices.refresh(), added time.sleep(2) ‑ 3 before querying.
Result: No change; Count was visible, confirming that refresh was not the issue.
3. ES version / API compatibility
Tried both index(..., document=doc) and index(..., body=doc), branched logic for 7.x vs 8.x/9.x.
Result: Still no _source content.
4. Per‑document Index calls
Iterated over data and called client.index() for each document, adding retry and logging.
Result: Same symptom – success responses, correct Count, empty _source.
5. Expanded validation
Added Get calls for each document and extensive logging.
Result: Get also returned empty _source, confirming that the document existed but contained no stored source.
Root cause
The index mapping disables _source:
{
"mappings": {
"_source": { "enabled": false },
"properties": { ... }
}
}Consequences:
Documents are indexed, inverted, and searchable, so Count works.
Bulk/Index calls report success because the indexing pipeline succeeds.
The original JSON is not stored; therefore Search and Get cannot return _source.
Why _source.enabled=false appears
LLM‑generated mapping templates sometimes include the setting for “search‑only” use cases.
Copied log or monitoring templates often disable _source to save storage.
Older default templates may have the flag turned off.
Correct fix
When creating the index, explicitly enable _source:
def create_index(self, index_name, mapping=None):
if mapping:
mappings_body = dict(mapping.get('mappings', {}))
# Core: force enable _source
mappings_body['_source'] = {'enabled': True}
self.client.indices.create(index=index_name, body={'mappings': mappings_body})
else:
self.client.indices.create(index=index_name)The single line mappings_body['_source'] = {'enabled': True} overrides any enabled: false in the supplied mapping.
Debugging checklist for “bulk succeeds, count >0, _source empty”
Check index mapping: GET /your_index/_mapping Verify that _source.enabled is not false.
Confirm index‑creation logic does not set _source.enabled: false (including generated or copied templates).
If mapping is correct, then revisit write‑path details such as bulk format, refresh options, or API version compatibility.
Practical advice for AI‑assisted debugging
Use hypothesis elimination: a normal Count rules out refresh or shard‑allocation problems.
When Get also lacks _source, discard query‑syntax issues.
Prioritize inspecting index metadata ( _source setting) before adding more code branches.
Leverage official diagnostic APIs:
GET /index/_mapping GET /index/_settings GET /index/_doc/{id}(inspect full response)
After fixing the mapping, remove the extra compatibility and validation layers to keep the codebase concise.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
