Operations 14 min read

How TencentOS NBS Solves Network Latency Mysteries: Real‑Time Trace Without Disruption

Network latency spikes often leave developers guessing whether the culprit lies in user‑space, the kernel stack, or the physical link; this article introduces TencentOS’s NBS (Net Blackboard System), a low‑overhead, zero‑disruption solution that pinpoints delay sources, supports continuous deployment, and outperforms traditional tools like tcpdump and bpftrace.

Tencent Architect
Tencent Architect
Tencent Architect
How TencentOS NBS Solves Network Latency Mysteries: Real‑Time Trace Without Disruption

1. Prologue

Most developers and operators have repeatedly wondered whose fault network jitter is—user‑space, kernel protocol stack, or transmission link. The kernel is a black box, making it hard to locate performance bottlenecks. Traditional tools like tcpdump or BPF are costly to deploy continuously, and they struggle to preserve the scene when jitter occurs. This article introduces TencentOS’s NBS (Net Blackboard System) and shows how it overcomes three major challenges: defining jitter boundaries, locating bottlenecks, and achieving normal‑deployment without disruption.

Pain Point 1: Unclear Origin of Network Jitter

Complex network topologies and diverse workloads make it difficult to identify the exact node causing high latency when jitter appears.

Pain Point 2: Blind Optimization Yields Limited Gains

Significant effort spent tweaking configurations, code, or architecture often produces minimal improvement when the true performance choke point is unknown.

Pain Point 3: Intermittent Cluster Jitter Is Hard to Trace

Transient jitter events disappear before they can be captured; tools like tcpdump generate massive data, and BPF‑based tools impose heavy overhead, making continuous online collection impractical.

Pain Point 4: Business Monitoring Lacks Low‑Level Visibility

Business‑level metrics mix user‑space and kernel‑space latency, obscuring the root cause. Upgrading the kernel for better monitoring is blocked by 24/7 service requirements.

2. NBS Practice – Defining Delay Jitter

A case where business monitoring showed increased latency after a change. Tcpdump revealed a 4‑second delay between the qdisc layer and the NIC driver. NBS logs pinpointed the delay to the qdisc stage, confirming the root cause.

Further investigation showed the default qdisc (pfifo_fast) on older kernels caused the issue; switching to fq_codel eliminated the delay.

3. NBS Practice – Uncovering Optimization Opportunities

Deploying NBS agents on production machines allowed per‑module latency statistics (average, p99, max, >5 ms count). The data revealed a few milliseconds of hidden kernel latency.

After a second‑phase optimization, kernel packet‑processing latency dropped 4.3 % on average, with p99 improving 6.5 % and variance decreasing 9.9 %.

4. NBS Network Latency Tracing Solution

4.1 Overall Architecture

NBS consists of a kernel collection module and a user‑space reporting agent.

Kernel Collection Module

The module uses the kernel’s ftrace framework to hook custom trace functions at selected points, ensuring minimal overhead.

Key hook points on the TCP send path:

TCP layer send (write, writev, send syscalls)

qdisc layer (packet queuing)

NIC driver layer (packet transmission)

Corresponding receive‑path hooks:

NIC receive (interrupt‑driven buffer fill)

User‑space receive (read, readv, recv syscalls)

NBS supports dual‑endpoint deployment; each hook can be enabled/disabled via sysctl, and traffic can be filtered by source/destination ports to avoid interfering with background traffic.

User‑Space Reporting Agent

The agent receives raw logs from the kernel module, parses them, aggregates statistics, stores them in a database, and forwards them to the monitoring platform. It provides pre‑configured settings for deployment, requiring no code changes or restarts on the target services.

4.2 Comparison with Industry Tools

We compared NBS against tcpdump and bpftrace across three test scenarios: single‑point monitoring, multi‑point monitoring, and interference testing.

Results show NBS introduces negligible performance impact, achieving near‑zero interference even under heavy load, whereas tcpdump and bpftrace cause noticeable throughput degradation.

5. History and Future

Early attempts relied on manual tcpdump captures, which were labor‑intensive and low‑yield. Google’s Dapper and Fathom pioneered distributed tracing, but still struggled with kernel‑level visibility. The rise of BPF offered low‑intrusion measurement but suffered from performance overhead.

Since 2024, the NBS project has contributed code upstream to the Linux kernel and continues to evolve, helping teams discover hidden performance bottlenecks and guide precise optimizations.

operationsPerformance MonitoringNetwork Latencykernel tracingNBS
Tencent Architect
Written by

Tencent Architect

We share insights on storage, computing, networking and explore leading industry technologies together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.