Operations 10 min read

Baize: Millisecond‑Level Open‑Source Network Monitoring that Exposes Even Tiny Packet Loss

Baize is an open‑source, configuration‑driven Go tool that delivers millisecond‑granular, high‑frequency (up to 5 000 packets per second) network quality monitoring, covering full ECMP paths, one‑way packet‑loss detection without clock sync, and complementary bit‑flip checks, with simple JSON deployment and built‑in profiling.

BirdNest Tech Talk
BirdNest Tech Talk
BirdNest Tech Talk
Baize: Millisecond‑Level Open‑Source Network Monitoring that Exposes Even Tiny Packet Loss

During an incident an online service timed out, user complaints surged, and the monitoring dashboard showed no alerts. After two hours the root cause was identified as an intermittent packet loss of 0.3‰ on a single link—something traditional monitoring missed.

What Baize Solves

Baize is a configuration‑driven, single‑process, dual‑role tool designed for long‑term continuous network‑quality monitoring.

Capability 1 — ECMP Full‑Path Coverage

Five‑tuple count and packet rate are configurable; the default is 100 client ports × 10 server ports = 1 000 concurrent probes. A deterministic port‑rotation algorithm exhaustively covers the ECMP hash space. Lost‑packet five‑tuples are printed, allowing pinpointing of the faulty device, port, and millisecond‑level loss timestamp.

Capability 2 — One‑Way Packet‑Loss Detection Without Clock Sync

Each probe carries the previous window’s actual packet count and the start port. The server uses these two fields together with the deterministic rotation algorithm to reconstruct the full set of port pairs for the previous window, compares them with the packets actually received, and computes the one‑way loss rate. No NTP or client‑side state is required.

Capability 3 — Complementary Bit‑Flip Detection

Four salt‑filling modes detect silent bit‑flips that pass TCP/UDP checksums:

Mode 0: all 0xFF – detects 1→0 flips.

Mode 1: all 0x00 – detects 0→1 flips.

Mode 2: 0x5A – fixed‑pattern detection.

Mode 3: 0xAAAA/0x5555 alternating – detects complementary flips that leave the checksum unchanged, identifying the exact byte and bit that changed.

Why "Millisecond‑Level"?

Default configuration sends 5 000 probes per second with a 1‑second statistics window. This enables detection of loss rates as low as 0.1 % (5 packets) and supports windows as short as 100 ms for sub‑second monitoring.

Raw IP socket + BPF filter bypasses the kernel UDP stack, reducing user‑space copies.

20 MB socket buffers handle traffic bursts.

Eight parallel readers split the port range into sub‑ranges; each runs in its own goroutine with BPF filtering.

Lock‑free design uses atomic counters for port pairs and sequence numbers, avoiding heap allocations on the hot path.

The implementation has run for years on thousands of machines, scaling to 20 000 pps in a full‑mesh, multi‑cluster data‑center environment.

Simple Deployment

Configuration is driven by a single JSON file.

{
  "client": {
    "client_addr": "10.0.0.1",
    "server_addrs": "10.0.0.2"
  }
}

Full‑duplex monitoring example:

{
  "pprof_addr": ":6060",
  "log_dir": "/var/log/baize",
  "log_max_age_days": 7,
  "client": {
    "client_addr": "10.0.0.1",
    "server_addrs": "10.0.0.2",
    "rate_in_span": 5000,
    "span": "1s",
    "delay": "3s",
    "msg_len": 1024
  },
  "server": {
    "server_addr": "10.0.0.1",
    "client_addrs": "10.0.0.2",
    "rate_in_span": 5000,
    "span": "1s",
    "delay": "3s",
    "msg_len": 1024
  }
}

Run with: sudo ./baize -c /etc/baize/baize.json A single process runs both client and server; logs rotate daily and can be tailed with tail -f /var/log/baize/baize.log. Built‑in pprof on :6060 provides live goroutine, heap, and CPU profiles.

baize vs. bitflip: How to Choose

Both tools share the same underlying engine. The differences are:

Configuration : bitflip uses command‑line arguments; baize uses a JSON configuration file.

Run mode : bitflip is single‑role; baize is single‑process dual‑role.

Use case : bitflip for temporary troubleshooting; baize for long‑term continuous monitoring.

Log handling : bitflip writes to stdout/stderr; baize rotates logs daily with automatic cleanup.

Ops integration : bitflip is manual; baize integrates with configuration‑management systems.

Typical Scenarios

High‑frequency inter‑cluster probing: full‑mesh monitoring across clusters within a data center.

Cross‑data‑center hybrid‑cloud links: 5 000 pps detection of anomalies within seconds.

Dedicated‑line SLA monitoring: continuous quality measurement for carrier‑grade links.

Network upgrade verification: compare pre‑ and post‑change metrics to ensure no regression.

Failover validation: confirm loss‑free and bit‑flip‑free paths after disaster‑recovery cut‑over before traffic migration.

Open‑Source and Community

Baize is the second tool in Baidu’s nettools suite, released under the MIT license.

GitHub: https://github.com/baidu/nettools

Documentation: https://nettools.rpcx.io/baize.html

Language: Go 1.26+

Platforms: Linux/macOS, AMD64/ARM64

The open‑source variant provides a pluggable Sender interface so users can forward data to ClickHouse, Prometheus, or any backend.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Goopen-sourceBPFNetwork MonitoringECMPbit flip detectionpacket loss detection
BirdNest Tech Talk
Written by

BirdNest Tech Talk

Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.