Why Did MongoDB’s Query Planner Suddenly Slow Down? A Deep Dive into Index Cache Failures
The article explains how null or empty query values caused MongoDB to ignore the intended index, leading to massive slow queries and service timeouts, and details the step‑by‑step investigation, plan‑cache invalidation, and the corrective addition of a compound index that restored normal performance.
Simple Introduction
When a MongoDB query contains null or an empty string, the index may become ineffective, causing the query optimizer to select an incorrect execution plan, resulting in slow queries and service anomalies.
Event Timeline
The user center reported frequent Dubbo RPC timeouts for third‑party login. The stack trace showed a RpcException caused by a remote method timeout:
org.apache.dubbo.rpc.RpcException: Failed to invoke the method loginWithThird ... timeout: 5000 msInvestigation Steps
Checked user‑center CAT monitoring – requests were received, confirming the client was not at fault.
Reviewed Kibana logs – no abnormal entries.
Used Arthas to profile the call chain and identified MongoDB query latency as the bottleneck.
MongoDB Query Analysis
Original query method:
public User getUserByDfId(String dfId, String app) {
Criteria criteria = Criteria.where("userBindInfo.dfId").is(dfId)
.and("app").is(app)
.and("status").is(1);
return this.mongoTemplate.findOne(Query.query(criteria), User.class);
}Existing indexes:
{ "key" : { "app" : 1 }, "name" : "app" }
{ "key" : { "userBindInfo.dfId" : 1 }, "name" : "userBindInfo.dfId" }Slow‑log analysis revealed that MongoDB used the app index instead of the userBindInfo.dfId index, causing the query to scan millions of documents.
Root Cause
The plan cache had stored a sub‑optimal plan after a previous failure. When the cache refreshed (due to LRU eviction, index changes, or a MongoDB restart), the optimizer chose the wrong index, and the bad plan was cached again, leading to repeated slow queries.
Fix Implemented
A new compound index was created to improve index matching:
db.getCollection("wormhole_user").createIndex({
"userBindInfo.dfId": 1,
"app": 1,
"status": 1
}, { name: "idx_user_app_userBindInfo.dfId", background: true });After the index was added, query latency returned to normal and the service recovered.
Further Analysis with explain()
Running explain() on the query showed the correct plan using the userBindInfo.dfId index, confirming that the issue was solely due to plan‑cache selection.
MongoDB Query Optimizer Behavior
The optimizer caches the most efficient plan for a given query shape.
Cache entries are invalidated on MongoDB restart, index modifications, or LRU eviction.
Conclusion & Recommendations
Avoid queries that contain null or empty values.
Use explain() to verify execution plans under different conditions.
Inspect the plan cache via MongoDB’s planCache utilities.
Apply hint() to force the use of the appropriate index when necessary.
Author: Zheng Yateng, Senior Backend Engineer, Micro‑Carp Technology Team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
WeiLi Technology Team
Practicing data-driven principles and believing technology can change the world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
