Operations 10 min read

Unraveling Data Center Congestion: Incast, ECN, and PFC Explained

This article examines why data‑center networks experience congestion, detailing many‑to‑one and all‑to‑all traffic patterns, the role of incast, and how mechanisms such as ECN and PFC can be tuned to achieve loss‑free, low‑latency communication.

Open Source Linux

Jun 5, 2024

Unraveling Data Center Congestion: Incast, ECN, and PFC Explained

Data center networks can become congested, primarily due to two traffic models: many‑to‑one and all‑to‑all.

Modern data centers often use a CLOS (Spine‑Leaf) architecture, which provides non‑blocking, scalable, and fault‑tolerant connectivity through full cross‑connections between spine and leaf switches.

In many‑to‑one scenarios, multiple servers send data simultaneously to a single receiver, creating an incast pattern that quickly overwhelms the receiving leaf’s buffer, leading to packet loss.

Increasing buffer size can mitigate loss but does not solve the problem, especially as network scale and link speeds grow; large buffers become costly and impractical.

The only way to achieve a loss‑free network under many‑to‑one traffic is to employ congestion‑control mechanisms that limit the traffic before it exceeds the receiver’s capacity, often by having the leaf switch signal the sources to throttle their sending rates.

All‑to‑all traffic, where multiple one‑to‑one flows coexist, can also produce incast at spine switches when flows intersect, again causing buffer overflow and packet loss.

To prevent loss in all‑to‑all scenarios, load‑balancing is required so that flows are routed over separate paths, avoiding cross‑traffic at any single switch.

Relying solely on large buffers is economically inefficient; a combination of small buffers and explicit flow‑control signals provides a more scalable solution.

Congestion control is a global process aimed at keeping the network stable under existing load. It can be categorized into two types: incast‑type congestion caused by many‑to‑one traffic, and congestion caused by uneven traffic scheduling in all‑to‑all traffic.

Explicit Congestion Notification (ECN) allows downstream devices to mark packets when their queues exceed a threshold, informing upstream senders to reduce their transmission rate, which reduces packet loss, improves latency for delay‑sensitive applications, and increases overall link utilization.

Early detection of congestion and proactive rate reduction.

Marking packets in overloaded queues without dropping them.

Reduced retransmission timers and better user experience for latency‑sensitive traffic.

Higher network utilization compared to networks without ECN.

When a device’s lossless queue exceeds the ECN threshold, it marks packets with ECN=11. The receiver then sends a Congestion Notification Packet (CNP) to the source, which slows its sending rate.

If congestion worsens and the queue exceeds the PFC (Priority Flow Control) threshold, the device sends a PFC pause frame to the source, halting traffic for the affected priority until the queue drains below the PFC release threshold.

Properly setting the ECN threshold ensures there is enough buffer space between the ECN marking point and the source’s rate‑reduction response, minimizing the chance that PFC will be triggered.

Flow Control Data Center Networking CLOS ECN Incast PFC

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.