Infrastructure Architecture Evolution in the DT Era: Heterogeneous Computing, Immersion Liquid Cooling, and Next‑Generation Server Design
The article examines how the shift from DT 1.0 to DT 2.0 drives new infrastructure challenges in computing, storage, and cooling, and describes Alibaba's heterogeneous‑computing platform, immersion liquid‑cooling technology, and the Xuanwu next‑generation server architecture as solutions.
With the rapid development of artificial intelligence in recent years, the DT 1.0 era supported by big data, cloud computing, and IoT is moving toward a fully intelligent DT 2.0 era. The key focus is how to empower traditional IT infrastructure for a data‑centric DT era and strengthen the integration of the Internet with traditional industries.
Infrastructure Architecture Evolution in the DT Era
Using Alibaba as an example, the DT 1.0 infrastructure can be divided into three layers: the foundation layer, the architecture layer, and the business layer. DT 1.0 faces two major growth scenarios: the massive increase of compute‑intensive tasks driven by big data and AI, and the surge of online machines during large‑scale events such as Double 11.
For infrastructure, data‑processing challenges are twofold: real‑time analytics demand millisecond‑level latency and rely heavily on cache or memory performance, while complex backend data processing requires high‑throughput capability. In the DT 2.0 era, data analysis becomes more complex, CPU and memory utilization rise, and scale continues to grow, necessitating more efficient compute and storage resources.
From the compute side, relying solely on simple architectural changes and process improvements to meet massive compute demand is showing fatigue, prompting heavy investment in heterogeneous‑computing solutions. From the memory side, the rapid increase in CPU core counts has not translated into proportional per‑core memory bandwidth, and emerging storage media are still in their infancy.
Heterogeneous computing faces ecosystem adaptation and execution‑efficiency challenges. Alibaba is moving from siloed, independent developments toward a platform‑based approach, abstracting hardware accelerator features to provide unified, transparent acceleration for upper‑layer applications. This platform enables businesses to focus on innovation rather than low‑level resource optimization.
The storage challenge introduces two new members in the hierarchical architecture: HBM between cache and memory, and SCM (storage class memory) between memory and flash. HBM offers strong performance but high cost, while SCM is cheaper but slower; selecting the appropriate technology for specific workloads is critical for large‑scale adoption.
Leveraging Alibaba's Lingjing performance data platform allows targeted identification of application enablement points, and the system is expected to support more business scenarios in the near future, with Alibaba planning to provide richer hardware forms.
Next‑Generation Cooling Technology – Immersion Liquid Cooling
Cloud computing centralizes compute resources, raising the demand for server performance and consequently increasing heat density, which poses challenges for data‑center cooling.
The first challenge is high‑power‑density rack cooling: most data centers provide about 8 kW per rack (some up to 15 kW), and air cooling cannot meet cost‑effective cooling requirements, necessitating new cooling methods.
The second challenge is the rapid rise in power and cooling costs; data‑center cooling consumes a large portion of energy, prompting the need to lower PUE and optimize TCO.
Alibaba addresses these challenges by adopting liquid‑cooling technology.
Immersion liquid cooling submerges heat‑generating components directly in a non‑conductive liquid, enabling direct heat exchange. Compared with air cooling, it eliminates the need for chillers, terminal air‑conditioning units, and server fans, simplifying the overall cooling architecture.
Immersion cooling can achieve PUE values below 1.09, far better than the industry average of 1.9, and eliminates the need for fans, saving roughly 48.4% of power consumption and significantly reducing OPEX.
Immersion cooling fully isolates IT equipment from air, protecting it from harsh environmental factors. Compared with air cooling, it removes humidity, vibration, and dust risks, improving reliability. Alibaba has independently developed and deployed the world’s first large‑scale immersion‑cooled server cluster for the internet industry.
Next‑Generation Server Architecture – Xuanwu
As external customers demand more storage, compute performance, and network speed, Alibaba also faces global delivery and million‑scale deployments, making it essential to reduce server R&D complexity, shorten verification cycles, and improve development efficiency.
The Xuanwu server, tightly coupled with Alibaba’s future business, offers seven models including compute, storage, and heterogeneous types.
Compared with previous generations, Xuanwu has five major advantages:
1. Pre‑maintenance architecture : improved maintenance environment, supports 300 W CPUs, and reduces fan power consumption by 30% under the same configuration.
2. Modular design : component modularity enables sharing, reduces part count, and boosts development efficiency.
3. Separation of compute and storage hardware : resolves asynchronous technology iteration, achieving optimal resource utilization.
4. Energy‑saving and emission reduction : optimizes power efficiency based on actual Alibaba workloads.
5. Integrated cabinet design for consolidated delivery.
6. Global certification and worldwide delivery.
The Xuanwu compute model, based on Intel’s latest processors, increases UPI bus bandwidth by 30%, memory capacity by 1.5×, and I/O effective bandwidth by 2×, providing physical support for the evolution to 100 G networking.
The high‑density storage JBOD fits 19‑inch 4U cabinets, delivering dense design, improved heat dissipation, and vibration control; disk temperatures stay below 52 °C (3–8 °C lower than generic models), and performance remains above 90% of rated specifications under any scenario.
Overall, Alibaba’s next‑generation Xuanwu server represents a deep‑level, end‑to‑end optimized architecture from data center to server to application, strengthening Alibaba’s platform competitiveness and extending it through Alibaba Cloud.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
