How BigBrother Enables Scalable Full‑Link Network Connectivity Checks in Hybrid Cloud
BigBrother is a large‑scale internal network connectivity detection system that uses TCP packet coloring to separate probe traffic from user traffic, supports physical and hybrid cloud scenarios, provides a complete detection framework for rapid fault localization, and has been proven in thousands of host migrations with near‑real‑time alerts.
1. Limitations of First‑Generation Connectivity Tools
The original tool relied on SSH jumps to the host, using OVS packet‑out to send crafted packets and tcpdump on the remote host to verify connectivity, which suffered from low efficiency, limited scalability, and inability to handle DPDK or P4 gateway products.
2. BigBrother Design Overview
BigBrother (named after the novel “1984”) monitors full‑link network connectivity by coloring TCP packets, distinguishing probe traffic from user traffic, and supporting both physical and cross‑region cloud scenarios. It consists of several components: mafia (console for task creation and result display), minitrue (converts user parameters into packet injection ranges), and telescreen (constructs and sends packets).
2.1 Key Concepts: Entrypoint and Endpoint
In the virtual network, each instance accesses the network via an Entrypoint (inbound/outbound packet gateway) and an Endpoint (the nearest network element to the instance). Different cloud scenarios map these to OVS, physical gateways (vpcgw, hybridgw), or ToR switches.
Scenario
Entrypoint
Endpoint
Public Cloud
ovs
ovs
Physical Cloud
vpcgw, hybridgw
ToR
Hosted Cloud
hcgw, cloudgw
PE
Cross‑Domain Gateway
sdngw
ovs
BigBrother injects GRE probe packets at the Entrypoint and mirrors them at the Endpoint for analysis.
3. Detection Process
The detection flow consists of two parts (orange and purple streams). For the orange stream (SRC→DST):
BigBrother simulates a probe packet from DST to the Endpoint.
The SRC Entrypoint forwards the packet to the Endpoint.
The Endpoint mirrors the packet to BigBrother.
The Endpoint forwards the packet to the instance.
The instance replies to the Endpoint.
The Endpoint encapsulates the reply in GRE and mirrors it to BigBrother.
The reply travels back through the Entrypoint chain to DST.
After both directions complete, BigBrother expects six mirrored probe packets; receiving all six indicates normal connectivity.
4. Probe Packet Design
Two candidate designs were evaluated:
4.1 icmp + TOS Scheme
Uses ICMP packets colored via the TOS field. Flow example:
cookie=0x20008,table=1,priority=40000,metadata=0x1,icmp,icmp_type=8,icmp_code=0,nw_tos=0x40 actions=Send_BB(),Learn(),Back_0()This approach requires complex flow hooks and cannot learn reverse flows in hybrid cloud, making it unsuitable.
4.2 tcp Scheme (Chosen)
Uses TCP packets with a dedicated source/destination port (port 11) for coloring. Flow example:
cookie=0x20008,table=1,priority=40000,tcp,metadata=0x1,tp_src=[port],tp_dst=[port] actions=Send_BB(),Back_0()The TCP scheme is simpler and works across all scenarios.
5. Task Execution and Concurrency
BigBrother can handle up to 32 concurrent tasks (5‑bit Task_id) and each task can test up to 2^27 pairs, allowing full‑mesh verification for VPCs with up to 10,000 hosts.
When an operator creates a task via the mafia console, the workflow is:
mafia sends a request to minitrue, which determines the probe range.
minitrue passes source/destination lists to telescreen.
telescreen builds GRE packets, enqueues them, and listens for mirrored replies.
minitrue periodically fetches results for analysis.
The final report is displayed in mafia, showing total pairs, successes, failures, and a bitmap of per‑pair status.
6. Active‑Flow Based Connectivity Checks
For scenarios where full‑mesh testing is unnecessary, BigBrother integrates with the river service, which provides a list of active flows. minitrue retrieves this list and reuses the standard detection pipeline, reducing overhead.
7. Production Use and Future Plans
Since its launch, BigBrother has been used in a host migration project, verifying connectivity for over 2,000 migrated VMs and detecting nearly ten anomalies. Future enhancements include measuring average and maximum latency, packet loss, and building continuous internal network monitoring for specific customers.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
