Cloud Computing 16 min read

How BigBrother Enables Scalable Full‑Link Network Connectivity Checks in Hybrid Cloud

BigBrother is a large‑scale internal network connectivity detection system that uses TCP packet coloring to separate probe traffic from user traffic, supports physical and hybrid cloud scenarios, provides a complete detection framework for rapid fault localization, and has been proven in thousands of host migrations with near‑real‑time alerts.

UCloud Tech
UCloud Tech
UCloud Tech
How BigBrother Enables Scalable Full‑Link Network Connectivity Checks in Hybrid Cloud

1. Limitations of First‑Generation Connectivity Tools

The original tool relied on SSH jumps to the host, using OVS packet‑out to send crafted packets and tcpdump on the remote host to verify connectivity, which suffered from low efficiency, limited scalability, and inability to handle DPDK or P4 gateway products.

2. BigBrother Design Overview

BigBrother (named after the novel “1984”) monitors full‑link network connectivity by coloring TCP packets, distinguishing probe traffic from user traffic, and supporting both physical and cross‑region cloud scenarios. It consists of several components: mafia (console for task creation and result display), minitrue (converts user parameters into packet injection ranges), and telescreen (constructs and sends packets).

2.1 Key Concepts: Entrypoint and Endpoint

In the virtual network, each instance accesses the network via an Entrypoint (inbound/outbound packet gateway) and an Endpoint (the nearest network element to the instance). Different cloud scenarios map these to OVS, physical gateways (vpcgw, hybridgw), or ToR switches.

Scenario

Entrypoint

Endpoint

Public Cloud

ovs

ovs

Physical Cloud

vpcgw, hybridgw

ToR

Hosted Cloud

hcgw, cloudgw

PE

Cross‑Domain Gateway

sdngw

ovs

BigBrother injects GRE probe packets at the Entrypoint and mirrors them at the Endpoint for analysis.

3. Detection Process

The detection flow consists of two parts (orange and purple streams). For the orange stream (SRC→DST):

BigBrother simulates a probe packet from DST to the Endpoint.

The SRC Entrypoint forwards the packet to the Endpoint.

The Endpoint mirrors the packet to BigBrother.

The Endpoint forwards the packet to the instance.

The instance replies to the Endpoint.

The Endpoint encapsulates the reply in GRE and mirrors it to BigBrother.

The reply travels back through the Entrypoint chain to DST.

After both directions complete, BigBrother expects six mirrored probe packets; receiving all six indicates normal connectivity.

4. Probe Packet Design

Two candidate designs were evaluated:

4.1 icmp + TOS Scheme

Uses ICMP packets colored via the TOS field. Flow example:

cookie=0x20008,table=1,priority=40000,metadata=0x1,icmp,icmp_type=8,icmp_code=0,nw_tos=0x40 actions=Send_BB(),Learn(),Back_0()

This approach requires complex flow hooks and cannot learn reverse flows in hybrid cloud, making it unsuitable.

4.2 tcp Scheme (Chosen)

Uses TCP packets with a dedicated source/destination port (port 11) for coloring. Flow example:

cookie=0x20008,table=1,priority=40000,tcp,metadata=0x1,tp_src=[port],tp_dst=[port] actions=Send_BB(),Back_0()

The TCP scheme is simpler and works across all scenarios.

5. Task Execution and Concurrency

BigBrother can handle up to 32 concurrent tasks (5‑bit Task_id) and each task can test up to 2^27 pairs, allowing full‑mesh verification for VPCs with up to 10,000 hosts.

When an operator creates a task via the mafia console, the workflow is:

mafia sends a request to minitrue, which determines the probe range.

minitrue passes source/destination lists to telescreen.

telescreen builds GRE packets, enqueues them, and listens for mirrored replies.

minitrue periodically fetches results for analysis.

The final report is displayed in mafia, showing total pairs, successes, failures, and a bitmap of per‑pair status.

6. Active‑Flow Based Connectivity Checks

For scenarios where full‑mesh testing is unnecessary, BigBrother integrates with the river service, which provides a list of active flows. minitrue retrieves this list and reuses the standard detection pipeline, reducing overhead.

7. Production Use and Future Plans

Since its launch, BigBrother has been used in a host migration project, verifying connectivity for over 2,000 migrated VMs and detecting nearly ten anomalies. Future enhancements include measuring average and maximum latency, packet loss, and building continuous internal network monitoring for specific customers.

Network Troubleshootingcloud connectivitylarge‑scale detectionpacket coloring
UCloud Tech
Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.