From Physical Cables to AI-Ready Cloud Networks: A Modern Networking Journey
This article explains the fundamentals of physical networking layers, introduces cloud networking concepts such as VPC, overlay and underlay, and discusses how high‑performance technologies like RDMA and ACCL are evolving to support AI large‑model training in modern cloud environments.
Introduction
The author, a former architecture engineer now working as a TAM in Alibaba Cloud Technical Services, shares why understanding network fundamentals and cloud network features is essential for solving customer problems.
Physical Network Basics
Physical Layer – The Foundation
The physical layer converts data into electrical, optical, or radio signals that travel over cables, fiber, or wireless media, much like a highway for data packets.
In one sentence: the physical layer connects devices and transmits bit streams, providing a reliable medium for higher‑level protocols.
Data Link Layer – Ensuring Integrity
The data link layer adds MAC addresses, detects and corrects errors, and encapsulates frames for reliable transmission.
One‑sentence summary: it uses Ethernet to wrap physical packets into frames and deliver them based on MAC addresses.
Network Layer – Routing Data
Using IP addresses, the network layer selects optimal paths to route packets across different networks.
In short: IP abstracts away physical differences, creating a virtual network where devices can communicate without worrying about underlying hardware.
Transport Layer – Reliable Delivery
TCP provides reliable, connection‑oriented delivery, while UDP offers fast, connection‑less transmission.
Application Layer – Network Services
This top layer hosts applications such as web browsers, email clients, and messaging tools that ultimately present data to users.
Cloud Network Overview
Why Cloud Networks Are Needed
Increasing demands for scalability, security, reliability, privacy, and performance have driven the evolution from flat physical‑plus‑virtual networks to isolated virtual networks (e.g., VLANs, VPCs).
Cloud Network Definition
Cloud‑native is shifting from “microservices + containers + continuous delivery + DevOps” to “software, hardware, and architecture born of the cloud.” Cloud networking is the core IaaS product that embodies this definition.
Core Components of Cloud Networks
The most important component is the Virtual Private Cloud (VPC), which isolates virtual networks using tunnel technology.
VPC Principle
Each VPC has a unique tunnel ID; traffic between ECS instances in the same VPC is encapsulated with this tunnel ID.
Different VPCs have different tunnel IDs, preventing direct communication and providing isolation.
VPC Logical Architecture
VPC consists of switches, gateways, and controllers. Switches and gateways form the data path, while controllers distribute forwarding tables via a proprietary protocol. All components are deployed in clusters with redundancy.
Overlay and Underlay in Cloud Networks
Underlay – The Physical Foundation
Composed of routers, switches, fiber, etc., providing basic data‑forwarding paths.
Ensures that packets can travel between physical nodes.
Overlay – Virtual Networks on Top
Built on the underlay, it creates isolated logical networks using software‑defined techniques.
Enables multi‑tenant isolation and flexible topology changes without altering the physical layout.
Collaboration
Underlay supplies the physical channels; overlay leverages them to deliver flexible, secure, high‑performance virtual networks.
From Physical to Cloud Networks
Overlay networks (e.g., VXLAN, NVGRE) are constructed atop the underlay, while SDN separates control and data planes, and NFV virtualizes functions like firewalls and load balancers.
AI Large‑Model Era Network Evolution
High‑Performance RDMA Architecture
Since 2016, Alibaba has built a large‑scale RDMA‑optimized network that reduces latency by 90% and supports AI workloads.
High‑Performance Collective Communication Library ACCL
ACCL provides congestion‑free, high‑throughput communication for massive AI clusters, achieving >80% linear scalability on thousands of GPUs.
High‑Performance Data Loading Accelerator KSpeed
KSpeed leverages RDMA and ACCL to accelerate data I/O, reducing data‑loading time from >60% of training duration to under 10%.
Conclusion
From physical hosts to virtual machines and cloud‑native containers, the underlying principles remain rooted in computer architecture. Mastering these fundamentals helps professionals keep pace with rapid technological evolution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
