Operations 13 min read

Unlock Hidden Savings: Optimizing Multi‑Data Center Bandwidth Costs

This article examines the characteristics and billing models of multi‑data‑center networks, analyzes external traffic patterns, identifies challenges in optimizing Internet‑facing bandwidth, and proposes practical scheduling strategies to better utilize idle bandwidth and reduce carrier costs.

Efficient Ops
Efficient Ops
Efficient Ops
Unlock Hidden Savings: Optimizing Multi‑Data Center Bandwidth Costs

1. Characteristics and Billing Models of Multi‑Data Centers

Background As internet services grow rapidly, the number of data centers and the complexity of their networks increase dramatically.

Multi‑Data Center Networks

Similar to Google’s network, large internet companies split their network into:

Data Center Internal Networks (DCNs)

WAN (Wide Area Networks)

According to traffic direction, WAN can be divided into two backbone networks:

Intra‑DC WAN (Inter‑DC WAN, like Google B4) connecting geographically distributed data centers.

Internet‑Facing WAN serving end‑users (search, video, download).

Bandwidth Billing Models

Internet‑Facing WAN incurs high fees from carriers. With rapid service growth, capacity has expanded from 10 G to 1 T or more.

Common external bandwidth billing models include:

Peak billing: sample bandwidth over a period (e.g., 5 min) and charge based on the maximum sample in a month.

95th percentile billing: discard the top 5 % of samples and charge on the highest remaining sample.

Daily peak average billing: sample daily peaks and charge on the average of those daily peaks over a month.

2. Characteristics of Data Center External Traffic

A Simple Peak‑Billing Example

Peak billing charges based on the monthly outbound bandwidth peak.

The figure shows a traffic graph where the peak occurs at 22:00 on day 1; billing is based on that peak.

Outside the peak, the data center can use more traffic for free.

The green area represents idle bandwidth that is free.

Thus, each data center has considerable idle bandwidth that is not fully utilized.

Special days (e.g., JD 618, Double 11, popular series releases) cause very high peaks.

During the rest of the month, bandwidth values are far below the peak, leaving abundant idle bandwidth.

A Daily‑Peak Billing Example

Some data centers use daily‑peak billing.

Similar to peak billing, daily‑peak data centers also have large amounts of idle bandwidth.

The figure shows that non‑peak moments have traffic far below the daily peak, leaving substantial idle bandwidth.

3. Challenges of Optimizing External Traffic

Because of peak billing, each data center often has significant idle bandwidth that could be used to optimize external traffic.

Challenges include:

1. Real‑time nature of Internet‑Facing Traffic

Traffic patterns show low usage around 6 am and peaks in the evening.

Shifting peak traffic to off‑peak times could reduce billed traffic, but user‑facing traffic is real‑time and cannot be delayed.

High‑real‑time services such as search, social networking, e‑commerce, and gaming cannot be shifted.

2. Uncertainty of Traffic Peaks

In practice, it is hard to know in advance when the monthly peak will occur or its magnitude, making it difficult to calculate available free bandwidth.

Typical external traffic scheduling changes DNS, which introduces latency.

However, user behavior shows regular patterns; machine‑learning models can predict traffic, allowing estimation of short‑term idle bandwidth.

4. How to Better Utilize Paid Bandwidth

We consider ways to use idle bandwidth across data centers to more fully utilize paid bandwidth.

1. Reduce Paid Bandwidth of Other Peak‑Billing DCs

Two DCs have peak times at 16:00 and 23:00 respectively; the other DC has idle bandwidth at the opposite time, allowing traffic shifting.

Simple traffic scheduling can lower both DCs’ peaks and reduce carrier fees.

With many DCs and diverse user behavior, appropriate scheduling algorithms can significantly cut external paid bandwidth.

Limitations arise when peaks occur close together or simultaneously, leaving no idle bandwidth to absorb others.

2. Reduce Paid Bandwidth of Daily‑Peak Billing DCs

Using idle bandwidth from peak‑billing DCs to absorb traffic from daily‑peak DCs yields the following optimized traffic graph:

Daily‑peak DCs see reduced billed bandwidth each day, lowering monthly carrier costs.

Even if all DCs hit the monthly peak on one day, other days still have capacity to accept daily‑peak traffic, reducing costs.

This approach relaxes the limitation of peak‑billing scheduling.

3. Use External Idle Bandwidth for Internal Traffic

Idle bandwidth is mostly available during off‑peak hours (midnight to 10 am), when user‑facing traffic is low.

Background flows of distributed storage, cloud services, search, etc., can use this idle external bandwidth for tolerant‑delay internal jobs, alleviating internal link congestion.

However, external networks have higher latency and loss; unified scheduling must consider external performance to avoid harming internal jobs.

5. Future Outlook

Multi‑data‑center external traffic scheduling can reduce billed bandwidth but has limitations and cannot fully exploit abundant idle bandwidth during idle periods.

Using external idle bandwidth for internal traffic requires coordinated intra‑ and inter‑network scheduling and attention to external transmission quality.

Overall, the high cost of external bandwidth warrants addressing these challenges.

operationstraffic schedulingbandwidth optimizationmulti-data centerpeak billing
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.