Implementation and Analysis of MongoDB Nearest Mode for Multi-Data Center Deployment
This article explains how MongoDB's nearest mode achieves proximity‑aware reads across multiple data centers by analyzing the internal mongos and driver code, detailing latency collection, smoothing algorithms, node selection logic, and providing configuration recommendations for latency‑sensitive workloads.
1. Background Introduction
To ensure service availability and data reliability, critical services deploy storage systems across multiple regions and data centers, for example Beijing, Shanghai, and Shenzhen, each storing a data replica so that a failure in one region does not affect business.
When deploying across data centers, network latency must be considered; for example, the ping between Shanghai and Shenzhen is about 30 ms, while intra‑data‑center latency is around 0.1 ms.
Tencent Cloud MongoDB combines L5 proximity access and an internal “nearest” mode to achieve near‑by access, avoiding latency penalties. The architecture includes mongos as a proxy and mongod as storage nodes forming a primary‑secondary replica set distributed across data centers.
2. What is the nearest access mode
2.1 Replica Set Concept
In MongoDB, a replica set is a collection of nodes that store identical data. Clients can access the replica set directly via the driver or through mongos .
Replica sets elect a primary via the Raft algorithm and synchronize data using the oplog.
2.2 Read‑Write Splitting and readPreference
MongoDB reads and writes default to the primary, but provides readPreference to separate read/write requests. Five readPreference types are available:
2.3 Read‑Write Consistency Guarantee
To ensure that reads from secondaries see the latest writes, set WriteConcern so that data is written to all nodes before acknowledging.
If the business model is write‑heavy and read‑light, cross‑data‑center synchronization should be considered carefully.
3. Nearest Mode Implementation Details
3.1 mongos Code Analysis
Latency Information Collection
mongos runs a probe thread every 5 seconds that issues the isMaster command to each replica set and records latency.
try {
ScopedDbConnection conn(ConnectionString(ns.host), socketTimeoutSecs);
bool ignoredOutParam = false;
Timer timer; // start timing
if (conn->isStillConnected()) {
conn->isMaster(ignoredOutParam, &reply); // execute isMaster
} else {
log() << "Connection to " << ns.host.toString() << " is closed";
reply = BSONObj();
}
pingMicros = timer.micros(); // record this round latency
conn.done();
} catch (const DBException &ex) {
...
}Then it smooths the latency using a moving average (1/4 of the delta):
if (reply.latencyMicros >= 0) {
if (latencyMicros == unknownLatency) {
latencyMicros = reply.latencyMicros; // first update
} else {
latencyMicros += (reply.latencyMicros - latencyMicros) / 4; // smooth update
}
}Nearest Node Selection
The algorithm sorts nodes by latency, discards nodes whose latency exceeds the nearest node by more than 15 ms, and randomly returns a qualifying node.
case ReadPreference::SecondaryOnly:
case ReadPreference::Nearest: {
BSONForEach(tagElem, criteria.tags.getTagBSON()) {
uassert(16358, "Tags should be a BSON object", tagElem.isABSONObj());
BSONObj tag = tagElem.Obj();
std::vector
matchingNodes;
for (size_t i = 0; i < nodes.size(); i++) {
if (nodes[i].matches(criteria.pref) && nodes[i].matches(tag)) {
matchingNodes.push_back(&nodes[i]);
}
}
if (matchingNodes.empty()) continue;
if (matchingNodes.size() == 1) return matchingNodes.front()->host;
std::sort(matchingNodes.begin(), matchingNodes.end(), compareLatencies);
for (size_t i = 1; i < matchingNodes.size(); i++) {
int64_t distance = matchingNodes[i]->latencyMicros - matchingNodes[0]->latencyMicros;
if (distance >= latencyThresholdMicros) {
matchingNodes.erase(matchingNodes.begin() + i, matchingNodes.end());
break;
}
}
if (ReplicaSetMonitor::useDeterministicHostSelection) {
return matchingNodes[roundRobin++ % matchingNodes.size()]->host;
} else {
return matchingNodes[rand.nextInt32(matchingNodes.size())]->host;
}
}
return HostAndPort();
}3.2 mgo Driver Code Analysis
Latency Information Collection
The mgo driver probes every 15 seconds using ping , keeps the maximum of the last six measurements as the latency reference.
for {
if loop {
time.Sleep(delay) // collect every 15 seconds
}
socket, _, err := server.AcquireSocket(0, delay)
if err == nil {
start := time.Now()
_, _ = socket.SimpleQuery(&op) // execute ping
delay := time.Now().Sub(start) // measure duration
server.pingWindow[server.pingIndex] = delay
server.pingIndex = (server.pingIndex + 1) % len(server.pingWindow)
server.pingCount++
var max time.Duration
for i := 0; i < len(server.pingWindow) && uint32(i) < server.pingCount; i++ {
if server.pingWindow[i] > max {
max = server.pingWindow[i]
}
}
server.pingValue = max // use max as latency metric
logf("Ping for %s is %d ms", server.Addr, max/time.Millisecond)
} else if err == errServerClosed {
return
}
if !loop { return }
}Nearest Node Selection
Similar to mongos, but prefers nodes with lower connection count to achieve load balancing.
func (servers *mongoServers) BestFit(mode Mode, serverTags []bson.D) *mongoServer {
var best *mongoServer
for _, next := range servers.slice {
if best == nil {
best = next
best.RLock()
}
if serverTags != nil && !next.info.Mongos && !best.hasTags(serverTags) {
best.RUnlock()
best = nil
}
next.RLock()
swap := false
switch {
case serverTags != nil && !next.info.Mongos && !next.hasTags(serverTags):
// must have requested tags
case next.info.Master != best.info.Master && mode != Nearest:
// prefer slaves unless mode is PrimaryPreferred
swap = (mode == PrimaryPreferred) != best.info.Master
case absDuration(next.pingValue-best.pingValue) > 15*time.Millisecond:
// prefer nearest server
swap = next.pingValue < best.pingValue
case len(next.liveSockets)-len(next.unusedSockets) < len(best.liveSockets)-len(best.unusedSockets):
// prefer servers with fewer connections
swap = true
}
if swap {
best.RUnlock()
best = next
} else {
next.RUnlock()
}
}
if best != nil {
best.RUnlock()
}
return best
}3.3 Official Go Driver Code Analysis
Latency Information Collection
The Go driver runs isMaster every 10 seconds, measures round‑trip time, and updates an exponential moving average with α = 0.2.
func (s *Server) updateAverageRTT(delay time.Duration) time.Duration {
if !s.averageRTTSet {
s.averageRTT = delay // first measurement
} else {
alpha := 0.2
s.averageRTT = time.Duration(alpha*float64(delay) + (1-alpha)*float64(s.averageRTT))
}
return s.averageRTT
}Nearest Node Selection
A composite selector combines ReadPrefSelector and LatencySelector . LatencySelector computes the minimum RTT among candidates, adds a configurable threshold (default 15 ms), and returns all nodes within that window.
func (ls *latencySelector) SelectServer(t Topology, candidates []Server) ([]Server, error) {
if ls.latency < 0 {
return candidates, nil
}
if len(candidates) == 0 || len(candidates) == 1 {
return candidates, nil
}
min := time.Duration(math.MaxInt64)
for _, candidate := range candidates {
if candidate.AverageRTTSet && candidate.AverageRTT < min {
min = candidate.AverageRTT
}
}
if min == time.Duration(math.MaxInt64) {
return candidates, nil
}
max := min + ls.latency
var result []Server
for _, candidate := range candidates {
if candidate.AverageRTTSet && candidate.AverageRTT <= max {
result = append(result, candidate)
}
}
return result, nil
}After obtaining the qualified list, a random node is chosen as the target.
selected := suitable[rand.Intn(len(suitable))]
selectedS, err := t.FindServer(selected)
if err != nil {
return nil, err
}
return selectedS, nilUsage Recommendations
The default 15 ms threshold can be overridden in mongos configuration ( replication.localPingThresholdMs ) or via Go driver ClientOptions for latency‑sensitive workloads.
4. Summary
MongoDB’s nearest mode enables proximity‑aware reads in multi‑data‑center deployments for both driver‑to‑mongod and mongos‑to‑mongod paths. This article dissected the implementation in Tencent Cloud MongoDB and common Go drivers, and offered configuration tips.
腾讯数据库技术团队对内支持QQ空间、微信红包、腾讯广告、腾讯音乐、腾讯新闻等公司自研业务,对外在腾讯云上支持TencentDB相关产品,如CynosDB、CDB、CTSDB、CMongo等。腾讯数据库技术团队专注于持续优化数据库内核和架构能力,提升数据库性能和稳定性,为腾讯自研业务和腾讯云客户提供“省心、放心”的数据库服务。此公众号和广大数据库技术爱好者一起,推广和分享数据库领域专业知识,希望对大家有所帮助。
Tencent Database Technology
Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.