Why Did MongoDB’s Query Planner Suddenly Slow Down? A Deep Dive into Index Cache Failures
The article explains how null or empty query values caused MongoDB to ignore the intended index, leading to massive slow queries and service timeouts, and details the step‑by‑step investigation, plan‑cache invalidation, and the corrective addition of a compound index that restored normal performance.
Simple Introduction
When a MongoDB query contains
nullor an empty string, the index may become ineffective, causing the query optimizer to select an incorrect execution plan, resulting in slow queries and service anomalies.
Event Timeline
The user center reported frequent Dubbo RPC timeouts for third‑party login. The stack trace showed a
RpcExceptioncaused by a remote method timeout:
<code>org.apache.dubbo.rpc.RpcException: Failed to invoke the method loginWithThird ... timeout: 5000 ms</code>Investigation Steps
Checked user‑center CAT monitoring – requests were received, confirming the client was not at fault.
Reviewed Kibana logs – no abnormal entries.
Used Arthas to profile the call chain and identified MongoDB query latency as the bottleneck.
MongoDB Query Analysis
Original query method:
<code>public User getUserByDfId(String dfId, String app) {
Criteria criteria = Criteria.where("userBindInfo.dfId").is(dfId)
.and("app").is(app)
.and("status").is(1);
return this.mongoTemplate.findOne(Query.query(criteria), User.class);
}</code>Existing indexes:
<code>{ "key" : { "app" : 1 }, "name" : "app" }
{ "key" : { "userBindInfo.dfId" : 1 }, "name" : "userBindInfo.dfId" }</code>Slow‑log analysis revealed that MongoDB used the
appindex instead of the
userBindInfo.dfIdindex, causing the query to scan millions of documents.
Root Cause
The plan cache had stored a sub‑optimal plan after a previous failure. When the cache refreshed (due to LRU eviction, index changes, or a MongoDB restart), the optimizer chose the wrong index, and the bad plan was cached again, leading to repeated slow queries.
Fix Implemented
A new compound index was created to improve index matching:
<code>db.getCollection("wormhole_user").createIndex({
"userBindInfo.dfId": 1,
"app": 1,
"status": 1
}, { name: "idx_user_app_userBindInfo.dfId", background: true });</code>After the index was added, query latency returned to normal and the service recovered.
Further Analysis with explain()
Running
explain()on the query showed the correct plan using the
userBindInfo.dfIdindex, confirming that the issue was solely due to plan‑cache selection.
MongoDB Query Optimizer Behavior
The optimizer caches the most efficient plan for a given query shape.
Cache entries are invalidated on MongoDB restart, index modifications, or LRU eviction.
Conclusion & Recommendations
Avoid queries that contain
nullor empty values.
Use
explain()to verify execution plans under different conditions.
Inspect the plan cache via MongoDB’s
planCacheutilities.
Apply
hint()to force the use of the appropriate index when necessary.
Author: Zheng Yateng, Senior Backend Engineer, Micro‑Carp Technology Team.
WeiLi Technology Team
Practicing data-driven principles and believing technology can change the world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.