Baize: Millisecond‑Level Open‑Source Network Monitoring that Exposes Even Tiny Packet Loss
Baize is an open‑source, configuration‑driven Go tool that delivers millisecond‑granular, high‑frequency (up to 5 000 packets per second) network quality monitoring, covering full ECMP paths, one‑way packet‑loss detection without clock sync, and complementary bit‑flip checks, with simple JSON deployment and built‑in profiling.
During an incident an online service timed out, user complaints surged, and the monitoring dashboard showed no alerts. After two hours the root cause was identified as an intermittent packet loss of 0.3‰ on a single link—something traditional monitoring missed.
What Baize Solves
Baize is a configuration‑driven, single‑process, dual‑role tool designed for long‑term continuous network‑quality monitoring.
Capability 1 — ECMP Full‑Path Coverage
Five‑tuple count and packet rate are configurable; the default is 100 client ports × 10 server ports = 1 000 concurrent probes. A deterministic port‑rotation algorithm exhaustively covers the ECMP hash space. Lost‑packet five‑tuples are printed, allowing pinpointing of the faulty device, port, and millisecond‑level loss timestamp.
Capability 2 — One‑Way Packet‑Loss Detection Without Clock Sync
Each probe carries the previous window’s actual packet count and the start port. The server uses these two fields together with the deterministic rotation algorithm to reconstruct the full set of port pairs for the previous window, compares them with the packets actually received, and computes the one‑way loss rate. No NTP or client‑side state is required.
Capability 3 — Complementary Bit‑Flip Detection
Four salt‑filling modes detect silent bit‑flips that pass TCP/UDP checksums:
Mode 0: all 0xFF – detects 1→0 flips.
Mode 1: all 0x00 – detects 0→1 flips.
Mode 2: 0x5A – fixed‑pattern detection.
Mode 3: 0xAAAA/0x5555 alternating – detects complementary flips that leave the checksum unchanged, identifying the exact byte and bit that changed.
Why "Millisecond‑Level"?
Default configuration sends 5 000 probes per second with a 1‑second statistics window. This enables detection of loss rates as low as 0.1 % (5 packets) and supports windows as short as 100 ms for sub‑second monitoring.
Raw IP socket + BPF filter bypasses the kernel UDP stack, reducing user‑space copies.
20 MB socket buffers handle traffic bursts.
Eight parallel readers split the port range into sub‑ranges; each runs in its own goroutine with BPF filtering.
Lock‑free design uses atomic counters for port pairs and sequence numbers, avoiding heap allocations on the hot path.
The implementation has run for years on thousands of machines, scaling to 20 000 pps in a full‑mesh, multi‑cluster data‑center environment.
Simple Deployment
Configuration is driven by a single JSON file.
{
"client": {
"client_addr": "10.0.0.1",
"server_addrs": "10.0.0.2"
}
}Full‑duplex monitoring example:
{
"pprof_addr": ":6060",
"log_dir": "/var/log/baize",
"log_max_age_days": 7,
"client": {
"client_addr": "10.0.0.1",
"server_addrs": "10.0.0.2",
"rate_in_span": 5000,
"span": "1s",
"delay": "3s",
"msg_len": 1024
},
"server": {
"server_addr": "10.0.0.1",
"client_addrs": "10.0.0.2",
"rate_in_span": 5000,
"span": "1s",
"delay": "3s",
"msg_len": 1024
}
}Run with: sudo ./baize -c /etc/baize/baize.json A single process runs both client and server; logs rotate daily and can be tailed with tail -f /var/log/baize/baize.log. Built‑in pprof on :6060 provides live goroutine, heap, and CPU profiles.
baize vs. bitflip: How to Choose
Both tools share the same underlying engine. The differences are:
Configuration : bitflip uses command‑line arguments; baize uses a JSON configuration file.
Run mode : bitflip is single‑role; baize is single‑process dual‑role.
Use case : bitflip for temporary troubleshooting; baize for long‑term continuous monitoring.
Log handling : bitflip writes to stdout/stderr; baize rotates logs daily with automatic cleanup.
Ops integration : bitflip is manual; baize integrates with configuration‑management systems.
Typical Scenarios
High‑frequency inter‑cluster probing: full‑mesh monitoring across clusters within a data center.
Cross‑data‑center hybrid‑cloud links: 5 000 pps detection of anomalies within seconds.
Dedicated‑line SLA monitoring: continuous quality measurement for carrier‑grade links.
Network upgrade verification: compare pre‑ and post‑change metrics to ensure no regression.
Failover validation: confirm loss‑free and bit‑flip‑free paths after disaster‑recovery cut‑over before traffic migration.
Open‑Source and Community
Baize is the second tool in Baidu’s nettools suite, released under the MIT license.
GitHub: https://github.com/baidu/nettools
Documentation: https://nettools.rpcx.io/baize.html
Language: Go 1.26+
Platforms: Linux/macOS, AMD64/ARM64
The open‑source variant provides a pluggable Sender interface so users can forward data to ClickHouse, Prometheus, or any backend.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
