Operations 15 min read

Lidar: Baidu’s Open‑Source Network Monitoring Tool That Goes Beyond PingMesh

The article introduces Lidar, Baidu’s new open‑source TCP‑SYN based network monitoring system, explains why the classic PingMesh approach is unsuitable for large data‑center environments, and details Lidar’s design, implementation quirks across Linux and macOS, BPF filtering, source‑port rotation for ECMP coverage, and practical usage examples.

BirdNest Tech Talk
BirdNest Tech Talk
BirdNest Tech Talk
Lidar: Baidu’s Open‑Source Network Monitoring Tool That Goes Beyond PingMesh

PingMesh limitations

PingMesh, described in the SIGCOMM 2015 paper "Pingmesh: A Large‑Scale System for Data Center Network Latency Measurement and Analysis", measures latency and loss by having every server ping every other server (O(n²) complexity). Microsoft ran it in production but limited full‑mesh probing to servers within the same hierarchy level to reduce traffic.

Lidar probing

In 2020/2021 Baidu needed a fast, accurate black‑box probing solution for its data‑center network. Lidar uses a single (or pair of) probing machines that fetch server lists from a topology database, select high‑quality targets, and send TCP SYN packets to probe reachability.

Three TCP responses, one answer

When a SYN is sent, the target’s TCP stack automatically replies with one of three packets:

SYN‑ACK – port open, service reachable.

RST – port closed or filtered, network reachable.

No reply – unreachable or packet loss.

No agents or server‑side deployment are required; only an IP address and a port are needed.

Listening ports (e.g., SSH 22) give the most reliable answer. Closed ports may not always return RST. Rate‑limiting policies typically allow 20‑100 PPS.

Packet capture on macOS vs Linux

Sending SYN packets uses SOCK_RAW + IPPROTO_RAW + IP_HDRINCL on both operating systems.

Receiving packets differs:

Linux : raw socket SOCK_RAW + IPPROTO_TCP receives a copy of every TCP packet (IP + TCP).

macOS : the kernel intercepts TCP packets, so raw sockets see nothing. The workaround is to open a BPF device, capture at the link layer (Ethernet + IP + TCP), and strip the Ethernet header before parsing IP/TCP.

High‑throughput filtering with BPF

On 10 GbE NICs handling hundreds of thousands of packets per second, receiving all TCP packets via raw socket overwhelms the kernel. A classic BPF filter attached with SO_ATTACH_FILTER discards everything except packets matching the probe port.

A = packet[14] & 0x0f * 4 // IP IHL → TCP offset
A = packet[A+0.1]          // TCP srcPort (target port)
if A != serverPort -> reject
A = packet[A+2.3]          // TCP dstPort (probe source port)
if A < localPort -> reject
if A >= localPort + count -> reject
accept

The filter consists of 12 cBPF instructions hard‑coded in the source; only matching packets reach user space.

Source‑port rotation for ECMP coverage

Fixed source ports cause the TCP five‑tuple hash to select a single path, leaving many ECMP routes untested. Lidar rotates the source port after each full round of probes. With 100 source ports, 100 rounds are performed, giving statistical coverage of typical multi‑path topologies.

// After probing all targets once, increment source port
s.currentPort++
if s.currentPort >= s.srcPort+s.portCount {
    s.currentPort = s.srcPort // wrap around
}

Command‑line usage

# Probe a single target on port 80
sudo ./lidar -t 10.0.0.2 -p 80

# Probe multiple targets on port 22
sudo ./lidar -t 10.0.0.2,10.0.0.3,10.0.0.4 -p 22

# High‑rate probe for 30 s
sudo ./lidar -t 10.0.0.2 -p 80 --rate 100 -d 30s

# Send a fixed number of packets
sudo ./lidar -t 10.0.0.2 -p 80 -n 1000

# Verbose mode to see loss per port
sudo ./lidar -t 10.0.0.2 -p 80 -v

Configuration can be supplied as JSON:

{
  "target_addrs": "10.0.0.1,10.0.0.2",
  "server_port": 80,
  "rate": 10,
  "span": "1s",
  "delay": "3s"
}
sudo ./lidar -c lidar.json

Relation to other nettools

bitflip detects UDP loss and modification, baize performs continuous loss detection, and lidar focuses on TCP SYN reachability. All share the same low‑level stack: raw‑socket packet construction, BPF‑based packet filtering, time‑bucket statistics, and rate control.

Project repository

https://github.com/baidu/nettools

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BPFData CenterNetwork Monitoringraw socketPingMeshTCP SYN scanning
BirdNest Tech Talk
Written by

BirdNest Tech Talk

Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.