Tagged articles
356 articles
Page 1 of 4
Architects' Tech Alliance
Architects' Tech Alliance
May 9, 2026 · Industry Insights

PCIe 8.0 Draft Unveiled: Toward a 1 TB/s Ultra‑Fast Era

The PCI‑SIG has released the PCIe 8.0 draft (0.5), promising 256 GT/s (1 TB/s per x16 link) that doubles PCIe 7.0, remains backward‑compatible, and aims to eliminate the bandwidth bottleneck for AI, GPUs, SSDs and CXL, with a spec expected in 2028 and market rollout around 2029‑30.

AI computeData centerHigh-speed interconnect
0 likes · 6 min read
PCIe 8.0 Draft Unveiled: Toward a 1 TB/s Ultra‑Fast Era
Java Tech Enthusiast
Java Tech Enthusiast
Apr 27, 2026 · Operations

Earn 30K CNY/month Guarding DeepSeek’s Data Center on the Mongolian Grasslands

DeepSeek is hiring senior data‑center operations and delivery managers to run its new facility in Ulanqab, Inner Mongolia, offering a 30 K CNY monthly salary and emphasizing a strategy that shifts from algorithmic innovation to low‑cost, high‑efficiency physical infrastructure to support its upcoming V4 trillion‑parameter model.

AI InfrastructureData centerDeepSeek
0 likes · 5 min read
Earn 30K CNY/month Guarding DeepSeek’s Data Center on the Mongolian Grasslands
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Jensen Huang Explains Why the Token Factory Is AI’s Ultimate Form

In a 150‑minute interview with Lex Fridman, Nvidia founder Jensen Huang argues that generative AI is turning data centers from storage warehouses into token factories, redefining compute as a production system and outlining the four‑stage Agent Scaling Law that drives this shift.

AI Scaling LawData centerJensen Huang
0 likes · 5 min read
Jensen Huang Explains Why the Token Factory Is AI’s Ultimate Form
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2026 · Industry Insights

Why CXL Is the Only Interconnect That Can Solve the Memory Wall, Resource Islands, and Cache Inconsistency

The article dissects how CXL emerged to address three fundamental data‑center bottlenecks—memory wall, resource islands, and cache‑incoherence—traces its technical evolution, compares the divergent strategies of Intel, AMD, Nvidia, Google, Alibaba Cloud, and Huawei, and evaluates CXL’s challenges, opportunities, and future ecosystem.

AI hardwareCXLData center
0 likes · 29 min read
Why CXL Is the Only Interconnect That Can Solve the Memory Wall, Resource Islands, and Cache Inconsistency
IT Services Circle
IT Services Circle
Apr 19, 2026 · Industry Insights

Why DeepSeek Is Moving Its AI Heart to the Mongolian Grasslands

DeepSeek’s latest hiring push reveals a strategic shift from algorithmic research to building and operating a high‑efficiency data center in Inner Mongolia’s Ulanqab, leveraging low‑temperature climate and existing cloud infrastructure to cut TCO, while gearing up for the upcoming V4 trillion‑parameter model.

AI InfrastructureData centerDeepSeek
0 likes · 5 min read
Why DeepSeek Is Moving Its AI Heart to the Mongolian Grasslands
Architects' Tech Alliance
Architects' Tech Alliance
Apr 19, 2026 · Industry Insights

Why NPO Beats CPO for AI Data Center Scale‑up: Alibaba Cloud’s Dual‑Network Blueprint

Alibaba Cloud argues that while CPO (Co‑Packaged Optics) looks perfect on paper, its closed ecosystem and production delays make it impractical for today’s 100k‑GPU AI clusters, and proposes an open, dual‑network architecture—HPN for scale‑out and UPN for ultra‑low‑latency scale‑up—driving a realistic near‑term roadmap for optical interconnect.

AIAlibaba CloudData center
0 likes · 8 min read
Why NPO Beats CPO for AI Data Center Scale‑up: Alibaba Cloud’s Dual‑Network Blueprint
SuanNi
SuanNi
Mar 18, 2026 · Industry Insights

Inside Nvidia GTC 2026: New AI Supercomputers, Open Agents and the Future of the Industry

Nvidia's GTC 2026 unveiled a suite of next‑generation AI rack systems, groundbreaking chips, open‑source agent frameworks like OpenClaw, and a roadmap that links massive compute power to real‑world applications such as autonomous driving, robotics and space‑based data centers, reshaping the AI ecosystem.

AI hardwareData centerGTC 2026
0 likes · 15 min read
Inside Nvidia GTC 2026: New AI Supercomputers, Open Agents and the Future of the Industry
IT Services Circle
IT Services Circle
Feb 25, 2026 · Operations

How to Tame Chaotic Data Center Cabling: 5 Proven Strategies

Managing data‑center cabling can quickly become a nightmare, but by applying five practical approaches—from manual sorting with labels to structured cabling, DCIM automation, zone‑based layouts, and minimalist designs—you can dramatically improve organization, cooling, and fault‑resolution speed while keeping costs under control.

DCIMData centerbest practices
0 likes · 10 min read
How to Tame Chaotic Data Center Cabling: 5 Proven Strategies
IT Services Circle
IT Services Circle
Feb 8, 2026 · Fundamentals

Can 100TB Mechanical HDDs Finally Challenge SSDs?

Western Digital unveiled a roadmap featuring a 100TB mechanical HDD and a high‑bandwidth HDD design that promises up to eight‑fold speed gains, targeting AI data‑center workloads while acknowledging that consumer availability will lag behind enterprise adoption.

AIData centerHDD
0 likes · 7 min read
Can 100TB Mechanical HDDs Finally Challenge SSDs?
Architects' Tech Alliance
Architects' Tech Alliance
Jan 13, 2026 · Artificial Intelligence

Inside Google’s Massive TPU SuperPod: How Scale‑Up and Scale‑Out Build a 9,216‑Chip AI Engine

The article explains Google’s TPU data‑center architecture, detailing the vertical Scale‑Up strategy within a SuperPod, the horizontal Scale‑Out across SuperPods, the 3D Torus topology with Twisted variants, and the multi‑layer network design that enables petabyte‑scale AI training and inference.

AI hardwareData centerScale‑Up
0 likes · 8 min read
Inside Google’s Massive TPU SuperPod: How Scale‑Up and Scale‑Out Build a 9,216‑Chip AI Engine
Architects' Tech Alliance
Architects' Tech Alliance
Nov 30, 2025 · Cloud Computing

How DPU Redefines Data Center Storage for AI and Cloud Workloads

This article analyzes the technical principles, architectural innovations, and real‑world scenarios of Data Processing Units (DPUs), showing how they resolve storage‑CPU mismatches, eliminate excessive east‑west traffic, and accelerate failure recovery, thereby becoming a core infrastructure for AI and cloud computing.

AIDPUData center
0 likes · 15 min read
How DPU Redefines Data Center Storage for AI and Cloud Workloads
Architects' Tech Alliance
Architects' Tech Alliance
Nov 9, 2025 · Artificial Intelligence

Why Optical Interconnects Are the Next Bottleneck‑Breaker for Massive AI Clusters

This article systematically examines the demand, technology stack, and industry landscape of large‑scale AI compute clusters, highlighting the limitations of traditional copper interconnects and presenting device‑level and chip‑level optical interconnect solutions—including OCS, pluggable modules, silicon photonics, VCSEL, and micro‑LED—while outlining current challenges and future directions.

AI clustersData centerHigh‑performance computing
0 likes · 15 min read
Why Optical Interconnects Are the Next Bottleneck‑Breaker for Massive AI Clusters
Su San Talks Tech
Su San Talks Tech
Oct 31, 2025 · Operations

How Douyin Powers Hundreds of Millions: Inside Its Bandwidth & Server Fleet

Douyin (ByteDance) operates over 170,000 servers across multiple self‑built data centers with total outbound bandwidth estimated between 7 TB and 10 TB, leveraging dual‑link designs, CDN acceleration, and multi‑node load balancing to support hundreds of millions of concurrent users worldwide.

ByteDanceCDNData center
0 likes · 9 min read
How Douyin Powers Hundreds of Millions: Inside Its Bandwidth & Server Fleet
ITPUB
ITPUB
Oct 7, 2025 · Operations

100+ Essential IT Operations Checklist to Keep Your Infrastructure Running Smoothly

This comprehensive guide presents a standardized operations manual covering over one hundred core maintenance checkpoints across server hardware, network devices, storage systems, operating systems, databases, virtualization platforms, backup solutions, security appliances, and data‑center facilities, helping IT teams ensure stable and reliable service delivery.

Data centerDatabase AdministrationIT Operations
0 likes · 34 min read
100+ Essential IT Operations Checklist to Keep Your Infrastructure Running Smoothly
Data Party THU
Data Party THU
Oct 5, 2025 · Industry Insights

How Google Cuts Gemini’s AI Energy Use to Microwatt Levels

Google reveals that a single Gemini query now consumes only 0.24 Wh of electricity, emits 0.03 g CO₂e and uses about five drops of water, thanks to a comprehensive measurement framework and aggressive optimizations across model architecture, quantization, hardware design, and data‑center operations.

AI energyAI sustainabilityData center
0 likes · 8 min read
How Google Cuts Gemini’s AI Energy Use to Microwatt Levels
Architects' Tech Alliance
Architects' Tech Alliance
Sep 28, 2025 · Artificial Intelligence

How AI Workloads Are Redefining Network Architecture: Key Requirements and Topologies

The article examines how the rapid growth of AI models and workloads is reshaping network design, highlighting the need for ultra‑high bandwidth, sub‑millisecond latency, reliability, scalable topologies like Fat‑Tree and Dragonfly, and robust security and QoS mechanisms across data‑center, cloud, and edge environments.

AI networkingData centerDistributed Training
0 likes · 11 min read
How AI Workloads Are Redefining Network Architecture: Key Requirements and Topologies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 19, 2025 · Artificial Intelligence

Why Nvidia’s Rubin CPX GPU Could Revolutionize Long-Context AI Inference

Nvidia's Rubin CPX GPU, unveiled in September 2025, uses GDDR7 memory and a split‑stage architecture to dramatically boost token‑per‑second rates for long‑context inference, while its integration into third‑generation Oberon servers promises higher power density, improved ROI, and scalable data‑center deployments.

AI inferenceData centerGPU architecture
0 likes · 9 min read
Why Nvidia’s Rubin CPX GPU Could Revolutionize Long-Context AI Inference
Architects' Tech Alliance
Architects' Tech Alliance
Sep 10, 2025 · Operations

Why Data Centers Are the Power Bottleneck for AI – Trends, Costs & Green Solutions

The article examines the soaring electricity demand of data centers worldwide, especially in China, highlights regional distribution and PUE improvements, explores AI's massive power consumption, and outlines green computing strategies such as efficiency upgrades, waste‑heat reuse, and renewable energy integration.

AI power demandData centerEnergy Consumption
0 likes · 11 min read
Why Data Centers Are the Power Bottleneck for AI – Trends, Costs & Green Solutions
ITPUB
ITPUB
Sep 4, 2025 · Operations

100‑Point IT Operations Checklist: From Server Health to Data Center Safety

A comprehensive 100‑item checklist guides IT operations engineers through daily inspections of servers, network gear, storage, operating systems, databases, virtualization, backup, security devices, and data‑center infrastructure, ensuring reliable performance, proactive issue detection, and adherence to best‑practice standards.

BackupData centerIT Operations
0 likes · 27 min read
100‑Point IT Operations Checklist: From Server Health to Data Center Safety
Architects' Tech Alliance
Architects' Tech Alliance
Sep 2, 2025 · Artificial Intelligence

Designing High‑Performance Networks for Massive AI Model Training

This article examines how AI large‑model training demands massive GPU clusters and low‑latency, high‑throughput networks, compares Clos/Fat‑Tree, Spine‑Leaf, Dragonfly, Group‑wise Dragonfly+ and Torus topologies, and discusses design choices for scaling to hundreds of thousands of GPUs while noting related data‑center resources.

AIData centerLarge-Scale Training
0 likes · 8 min read
Designing High‑Performance Networks for Massive AI Model Training
Efficient Ops
Efficient Ops
Sep 1, 2025 · Operations

Inside ByteDance’s Massive Server Fleet and TB‑Level Bandwidth

This article examines ByteDance’s enormous server inventory and data‑center export bandwidth, explaining how T‑level (terabit) connections, dual‑link designs, CDN acceleration, and global data‑center deployments enable billions of users to stream content simultaneously.

ByteDanceCDNData center
0 likes · 8 min read
Inside ByteDance’s Massive Server Fleet and TB‑Level Bandwidth
Architects' Tech Alliance
Architects' Tech Alliance
Aug 20, 2025 · Artificial Intelligence

Dual ToR and Dual‑Plane Designs: Boosting AI Training Performance in Large‑Scale Data Centers

The article explains how non‑stacked dual‑ToR and dual‑plane network architectures, combined with single‑chip high‑performance switches and multi‑rail host networking, dramatically improve reliability, load balance, and end‑to‑end training speed for massive AI models such as GPT‑3 175B.

AI networkingData centerGPU training
0 likes · 11 min read
Dual ToR and Dual‑Plane Designs: Boosting AI Training Performance in Large‑Scale Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Aug 3, 2025 · Fundamentals

Why CXL Interconnect Chips Are the Next Big Leap for Data Centers

The article examines CXL interconnect chips—high‑speed, low‑latency devices built on the Compute Express Link protocol—covering their technical fundamentals, supportive policies, industry chain, booming Chinese server market demand, global market forecasts, competitive landscape, and future trends driven by AI and data‑center workloads.

AICXLData center
0 likes · 8 min read
Why CXL Interconnect Chips Are the Next Big Leap for Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Jul 30, 2025 · Fundamentals

PCIe Interconnect Chip Market: Trends, Policies, and Future Outlook

This article provides a comprehensive overview of PCIe interconnect chips, covering their definition, industry classification, development history, policy support, supply‑chain structure, Chinese server market growth, global market size and forecasts, product composition, competitive landscape, and future trends toward higher speeds and lower latency.

Data centerHardwareMarket analysis
0 likes · 8 min read
PCIe Interconnect Chip Market: Trends, Policies, and Future Outlook
dbaplus Community
dbaplus Community
Jul 24, 2025 · Operations

How Bilibili Scales Server Fault Management with Automated Detection and Repair

This article details Bilibili's approach to handling explosive growth in server count by classifying faults, identifying shortcomings of manual processes, and implementing an automated, end‑to‑end detection, rule‑based alerting, and repair workflow that combines in‑band and out‑of‑band data collection to achieve near‑perfect coverage and accuracy.

Data centerfault detectionin‑band
0 likes · 17 min read
How Bilibili Scales Server Fault Management with Automated Detection and Repair
Architects' Tech Alliance
Architects' Tech Alliance
Jul 23, 2025 · Artificial Intelligence

Why Do AI Large‑Model Training Clusters Need Specialized Network Topologies?

The article explains how AI large‑model training demands massive GPU resources and how carefully designed network architectures—such as Clos/Fat‑Tree, Spine‑Leaf, multi‑rail versus single‑rail connections, Dragonfly, and Torus—impact performance, scalability, cost, and reliability, guiding the selection of optimal data‑center networks.

AIData centerGPU clusters
0 likes · 9 min read
Why Do AI Large‑Model Training Clusters Need Specialized Network Topologies?
Architects' Tech Alliance
Architects' Tech Alliance
Jul 19, 2025 · Artificial Intelligence

Best GPU Cluster Network for Large‑Scale AI: NVLink, InfiniBand, RoCE & DDC

This article compares the main networking technologies used in large‑scale AI GPU clusters—NVLink, InfiniBand, RoCE Ethernet, and the emerging DDC full‑schedule fabric—examining latency, lossless transmission, congestion control, cost, power and scalability to help engineers choose the optimal solution for training massive language models.

AI trainingDDCData center
0 likes · 15 min read
Best GPU Cluster Network for Large‑Scale AI: NVLink, InfiniBand, RoCE & DDC
Architects' Tech Alliance
Architects' Tech Alliance
Jul 7, 2025 · Operations

Choosing the Right AI Data Center Network: InfiniBand vs RoCE

This article outlines the high‑performance networking requirements for AI data center training, compares InfiniBand and RoCE solutions, discusses their advantages in bandwidth, latency, scalability and cost, and provides design guidelines for building scalable, low‑latency, non‑blocking AI‑centric network architectures.

AIData centerHigh‑performance computing
0 likes · 10 min read
Choosing the Right AI Data Center Network: InfiniBand vs RoCE
Architects' Tech Alliance
Architects' Tech Alliance
Jul 6, 2025 · Fundamentals

Mastering Data Center Essentials: 100 Core Concepts You Must Know

This comprehensive guide walks you through 100 essential data‑center concepts—from basic definitions, tier standards, and modular design to networking layers, storage architectures, compute resources, security measures, operational practices, energy efficiency, emerging technologies, and industry ecosystem—providing a complete knowledge framework for modern digital infrastructure.

ComputeData centerInfrastructure
0 likes · 21 min read
Mastering Data Center Essentials: 100 Core Concepts You Must Know
Bilibili Tech
Bilibili Tech
Jul 4, 2025 · Operations

Solving CPU Performance Layering in Heterogeneous Data Centers: A Practical Guide

This article explains why heterogeneous servers cause CPU performance layering, describes how to detect the issue using metrics such as NUMA hit/miss rates, cache miss ratios and frequency states, and provides step‑by‑step remediation techniques—including NUMA binding, cache isolation, recompilation and frequency locking—to improve resource pooling efficiency in modern data centers.

CPU performanceData centerNUMA
0 likes · 24 min read
Solving CPU Performance Layering in Heterogeneous Data Centers: A Practical Guide
Architects' Tech Alliance
Architects' Tech Alliance
Jun 22, 2025 · Fundamentals

Mastering Data Center Networks: 100 Essential Concepts Explained

This comprehensive guide covers 100 fundamental concepts of data center networking, including architecture, protocols, security, virtualization, performance, interconnects, hardware standards, emerging technologies, and industry ecosystems, providing readers with a complete technical foundation for modern digital infrastructure.

Data centercloud networkingnetwork architecture
0 likes · 19 min read
Mastering Data Center Networks: 100 Essential Concepts Explained
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2025 · Artificial Intelligence

NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing

NVLink Fusion, unveiled at Computex 2025, extends NVIDIA’s NVLink technology to enable high‑bandwidth, low‑latency connections between CPUs and GPUs or third‑party accelerators, offering up to 900 GB/s bandwidth, flexible heterogeneous configurations, ecosystem expansion, performance gains for AI training and inference, and potential cost reductions.

AICPUData center
0 likes · 12 min read
NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing
Top Architecture Tech Stack
Top Architecture Tech Stack
May 22, 2025 · Operations

Understanding the Bandwidth and Server Scale of Douyin (TikTok) Data Centers

This article explains how Douyin (TikTok) and other major Chinese platforms achieve massive concurrent usage by operating data centers with hundreds of thousands of servers, employing terabit-level outbound bandwidth, dual‑link designs, CDN acceleration, and multi‑node load balancing, and provides estimates of server counts and bandwidth capacities.

CDNData centerDouyin
0 likes · 8 min read
Understanding the Bandwidth and Server Scale of Douyin (TikTok) Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Apr 29, 2025 · Industry Insights

Next-Gen Server Architecture: CPUs, GPUs, Memory, and Certification Insights

This article provides a comprehensive analysis of modern server architecture, covering the evolution from CISC to RISC, the rise of heterogeneous computing with GPUs and accelerators, diverse form factors, core component technologies, reliability mechanisms, performance benchmarking, certification standards, and emerging trends such as liquid cooling and AI‑native designs.

CPUData centerGPU
0 likes · 11 min read
Next-Gen Server Architecture: CPUs, GPUs, Memory, and Certification Insights
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2025 · Artificial Intelligence

UALink 1.0: An Open High‑Speed Interconnect Challenging Nvidia’s AI Dominance

The UALink 1.0 specification, driven by AMD, Intel, Broadcom and other industry leaders, introduces an open, low‑latency, high‑bandwidth interconnect that can link up to 1,024 AI accelerators, offering a cost‑effective alternative to Nvidia’s NVLink and reshaping the AI‑HPC market.

AI interconnectData centerNvidia competition
0 likes · 11 min read
UALink 1.0: An Open High‑Speed Interconnect Challenging Nvidia’s AI Dominance
Architects' Tech Alliance
Architects' Tech Alliance
Apr 20, 2025 · Fundamentals

What Makes Server CPUs Tick? A Deep Dive into Architecture, Performance, and Future Trends

This article provides a comprehensive overview of server CPUs, covering their core functions, major architectures such as x86, ARM, POWER and SPARC, key performance metrics, leading manufacturers, typical application scenarios, power‑management techniques, and emerging trends like quantum, photonic, and AI acceleration.

CPU architectureData centerFuture Trends
0 likes · 16 min read
What Makes Server CPUs Tick? A Deep Dive into Architecture, Performance, and Future Trends
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 18, 2025 · Artificial Intelligence

Alibaba Cloud Showcases Optical Interconnect Innovations at OFC 2025 50th Anniversary

At the OFC 2025 50th anniversary in San Francisco, Alibaba Cloud presented cutting‑edge optical interconnect research and solutions for AI computing and modern data‑center networks, highlighted by invited talks, breakthrough demos, and two data‑driven QoT estimation papers co‑authored with Hong Kong Polytechnic University.

AI computingData centerPhotonic Integration
0 likes · 6 min read
Alibaba Cloud Showcases Optical Interconnect Innovations at OFC 2025 50th Anniversary
ByteDance SYS Tech
ByteDance SYS Tech
Apr 11, 2025 · Operations

How User‑Space MPTCP with DPDK Doubles Throughput in Data Centers

This article details the design, implementation, and performance evaluation of a user‑space MPTCP stack built on DPDK, showing how a layered, zero‑copy architecture and same‑core lock‑free forwarding can boost data‑center throughput by up to 100% while reducing latency by about 10%, all while remaining compatible with existing TCP applications.

DPDKData centerMPTCP
0 likes · 12 min read
How User‑Space MPTCP with DPDK Doubles Throughput in Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Mar 29, 2025 · Industry Insights

Why Network Becomes the New Bottleneck for AI Training and How InfiniBand vs RoCE Compare

AI large‑model training relies on GPU clusters, generating massive inter‑node traffic that turns network performance into the primary bottleneck, prompting a detailed comparison of InfiniBand and RoCE protocols, their histories, strengths, limitations, and the need for next‑generation network chip architectures.

AIData centerHPC
0 likes · 5 min read
Why Network Becomes the New Bottleneck for AI Training and How InfiniBand vs RoCE Compare
Architects' Tech Alliance
Architects' Tech Alliance
Mar 26, 2025 · Industry Insights

How DPU‑Based Architectures Revolutionize High‑Performance Storage Networks

This article examines the role of Data Processing Units (DPUs) in modern data‑center storage networking, detailing their architecture, core offload technologies, three offload modes, and the performance advantages they bring to both bare‑metal and virtualized environments while highlighting trade‑offs and implementation considerations.

DPUData centerOffload
0 likes · 12 min read
How DPU‑Based Architectures Revolutionize High‑Performance Storage Networks
Code Mala Tang
Code Mala Tang
Mar 21, 2025 · Artificial Intelligence

What Are the Four Waves of AI and How NVIDIA Is Shaping the Future?

NVIDIA’s GTC 2025 keynote outlines the four AI waves—from perception to physical AI—while highlighting the company’s latest Blackwell chips, DGX Spark/Station computers, Dynamo inference accelerator, robotics collaborations, GM autonomous‑driving partnership, and AI‑native 6G efforts, underscoring massive data‑center investment and future challenges.

AI hardwareData centerNvidia
0 likes · 11 min read
What Are the Four Waves of AI and How NVIDIA Is Shaping the Future?
ByteDance SYS Tech
ByteDance SYS Tech
Feb 18, 2025 · Operations

How Can Data Center Planning Cut Costs and Boost Efficiency?

This article explains how a mixed‑integer programming tool developed by ByteDance's SYS‑DCD team integrates cost, reliability, delivery speed, and environmental metrics to optimize data‑center planning, reduce power waste, and accelerate deployment across multiple regional scenarios.

Data centerLinear ProgrammingOperations
0 likes · 15 min read
How Can Data Center Planning Cut Costs and Boost Efficiency?
Architects' Tech Alliance
Architects' Tech Alliance
Jan 14, 2025 · Industry Insights

AI Server Market 2024: Growth Trends, Types, and Key Challenges

The 2024 AI server market is booming with global shipments surpassing 1.2 million units in 2023 and projected to reach 1.67 million in 2024, driven by rapid growth in China’s AI compute capacity, distinct training and inference server designs, and facing challenges in GPU quality, high‑speed interconnects, and cooling solutions.

2024AI hardwareAI servers
0 likes · 5 min read
AI Server Market 2024: Growth Trends, Types, and Key Challenges
Deepin Linux
Deepin Linux
Dec 25, 2024 · Fundamentals

An Introduction to RDMA: Principles, Programming, and Applications

This article explains RDMA technology, covering its core principles, programming model with Verbs API, various communication modes, and its impact on data‑center networking, high‑performance computing, and distributed storage, highlighting its low‑latency, zero‑copy advantages over traditional TCP/IP.

Data centerHigh‑performance computingNetwork programming
0 likes · 30 min read
An Introduction to RDMA: Principles, Programming, and Applications
Architects' Tech Alliance
Architects' Tech Alliance
Dec 19, 2024 · Industry Insights

Inside Fujitsu’s Monaka: 144‑Core Armv9 AI Chip Unveiled

Fujitsu’s new Monaka processor, a 144‑core Armv9‑based AI and data‑center CPU built on a 2 nm 3.5D CoWoS platform, promises double the energy efficiency of competing EPYC and Xeon chips by 2027, leveraging DDR5 memory, SVE2 extensions, and advanced security features.

AI processorArmv9CPU architecture
0 likes · 10 min read
Inside Fujitsu’s Monaka: 144‑Core Armv9 AI Chip Unveiled
Architects' Tech Alliance
Architects' Tech Alliance
Sep 14, 2024 · Industry Insights

What Sets Core Switches Apart from Regular Switches? A Deep Dive

This article explains what distinguishes core switches from ordinary switches, covering their placement in the network hierarchy, port and performance differences, advanced features such as large buffers, virtualization, TRILL and FCoE, and essential functions like link aggregation, stacking, and hot standby protocols.

Data centerFCoEHSRP
0 likes · 12 min read
What Sets Core Switches Apart from Regular Switches? A Deep Dive
Architects' Tech Alliance
Architects' Tech Alliance
Sep 8, 2024 · Artificial Intelligence

Design and Architecture of Multi‑Million GPU Clusters for Large‑Scale AI Model Training

The article surveys the network architectures and congestion‑control techniques used in massive GPU clusters—such as Byte’s megascale, Baidu HPN, Alibaba HPN7, and Tencent Xingmai 2.0—highlighting how high‑bandwidth, low‑latency designs and advanced RDMA technologies enable training of trillion‑parameter multimodal AI models.

Data centerGPU clustersHPN
0 likes · 11 min read
Design and Architecture of Multi‑Million GPU Clusters for Large‑Scale AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Sep 1, 2024 · Fundamentals

Full Liquid‑Cooled Cold Plate Server Design and Performance Testing (2024)

This article presents a comprehensive reference design and performance evaluation of a 2U four‑node high‑density server employing full liquid‑cooled cold plates for CPUs, memory, storage, NICs, and power supplies, detailing system architecture, flow design, CFD validation, and future optimization directions.

CFD simulationData centerhigh density
0 likes · 11 min read
Full Liquid‑Cooled Cold Plate Server Design and Performance Testing (2024)
Architects' Tech Alliance
Architects' Tech Alliance
Aug 30, 2024 · Cloud Native

AmpereOne A192-32X: A 192‑Core ARM Server CPU and Its LGA5964 Socket

The article provides an in‑depth technical overview of Ampere’s custom‑core AmpereOne A192‑32X 192‑core ARM server processor, covering its architecture, cloud‑native features, performance comparisons with AMD EPYC and Intel Xeon, cooling design, LGA5964 socket details, and benchmark results from real‑world stress testing.

ARM server CPUAmpereOneCloud Native
0 likes · 10 min read
AmpereOne A192-32X: A 192‑Core ARM Server CPU and Its LGA5964 Socket
Architects' Tech Alliance
Architects' Tech Alliance
Aug 29, 2024 · Industry Insights

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

The article analyzes NVIDIA's DGX SuperPOD architectures across three GPU generations—H100, GH200, and GB200—detailing their NVLink/NVSwitch topologies, bandwidth calculations, scalability limits, and the practical challenges of constructing 256‑GPU and 576‑GPU supercomputing clusters.

Data centerGPUHigh‑performance computing
0 likes · 11 min read
How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Aug 21, 2024 · Fundamentals

Comprehensive Liquid‑Cooling Reference Design for Server Components (2024)

This white‑paper presents a 2024 reference design and performance evaluation of full‑liquid‑cooling solutions for CPUs, memory, SSDs, PCIe/OCP cards, power supplies and IO boards, detailing architecture, advantages, implementation methods and deployment scenarios for data‑center and telecom applications.

Data centerhardware engineeringliquid cooling
0 likes · 12 min read
Comprehensive Liquid‑Cooling Reference Design for Server Components (2024)
Architects' Tech Alliance
Architects' Tech Alliance
Aug 10, 2024 · Industry Insights

What’s Next for Data Center Network Architecture? Trends and Future Directions

The article explains how data‑center networking, once built from a handful of simple devices, now demands sophisticated layer‑2 and layer‑3 designs to meet higher performance and reliability requirements, and it outlines the evolving architectural patterns that will shape future data‑center networks.

Data centerFuture TrendsNetworking
0 likes · 6 min read
What’s Next for Data Center Network Architecture? Trends and Future Directions
Architects' Tech Alliance
Architects' Tech Alliance
Aug 7, 2024 · Industry Insights

How Full Liquid‑Cooling Servers Achieve Near‑100% Heat Capture: Design, Flow, and Component Insights

This article details the architecture of a 2U four‑node liquid‑cooled server, explaining its component layout, serial flow design, heat‑capture calculations, and the engineering of CPU, memory, SSD, PCIe/OCP, IO, and PSU cold‑plate solutions that together remove about 95% of heat directly via liquid and the remaining 5% through a rear‑mounted air‑liquid heat exchanger.

Data centerhigh density computingliquid cooling
0 likes · 17 min read
How Full Liquid‑Cooling Servers Achieve Near‑100% Heat Capture: Design, Flow, and Component Insights
Architects' Tech Alliance
Architects' Tech Alliance
Aug 1, 2024 · Industry Insights

Why RDMA and RoCE Are Becoming Critical Enablers for AI/ML Deployments

The article analyzes how the rapid shift of data‑center spending toward AI/ML has accelerated RDMA and RoCE adoption, outlines market forecasts through 2028, explains the technical advantages of direct memory access, and examines the evolving server, NIC, and backend‑network landscapes that will shape future AI workloads.

AI/MLData centerRDMA
0 likes · 12 min read
Why RDMA and RoCE Are Becoming Critical Enablers for AI/ML Deployments
Architects' Tech Alliance
Architects' Tech Alliance
Jul 22, 2024 · Fundamentals

Comprehensive Overview of Data Center Architecture and Its Core Components

This article provides a detailed overview of modern data center architecture, covering physical and IT infrastructure, network topologies such as three‑tier and spine‑leaf, storage solutions like DAS, NAS and SAN, server designs, cloud data‑center components, physical site considerations, and various data‑center deployment models.

Data centerInfrastructureStorage Systems
0 likes · 20 min read
Comprehensive Overview of Data Center Architecture and Its Core Components
Architects' Tech Alliance
Architects' Tech Alliance
Jul 15, 2024 · Industry Insights

Why Ethernet Is Overtaking InfiniBand in AI and Data Center Networks

The article analyzes the 2022 global and Chinese switch markets, explains how distributed computing and generative AI workloads rely on high‑performance switches, compares Ethernet and InfiniBand technologies—including bandwidth, latency, and cost factors—and outlines major vendor strategies and future trends in the networking industry.

AIData centerInfiniBand
0 likes · 14 min read
Why Ethernet Is Overtaking InfiniBand in AI and Data Center Networks
Architects' Tech Alliance
Architects' Tech Alliance
Jun 19, 2024 · Industry Insights

China's Computing Power Network Market 2024: Trends, Scale, and Future Outlook

The 2024 white paper on China's computing power network outlines the evolution of data centers and IDC services, quantifies the digital economy’s 50.2 trillion RMB size, details a 624.75 billion RMB market in 2022 with a projected 1.06 trillion RMB valuation by 2025, and examines technology, application sectors, and emerging standards driving the industry.

AIData centerDigital Economy
0 likes · 8 min read
China's Computing Power Network Market 2024: Trends, Scale, and Future Outlook
Practical DevOps Architecture
Practical DevOps Architecture
Jun 13, 2024 · Operations

Comprehensive Data Center Operations Training Course Overview

This extensive training program covers everything a data center operations engineer needs—from foundational infrastructure management and server hardware maintenance to advanced network configuration, security hardening, monitoring, fault handling, and practical hands‑on skills for real‑world challenges.

Data centerInfrastructureOperations
0 likes · 6 min read
Comprehensive Data Center Operations Training Course Overview
IT Architects Alliance
IT Architects Alliance
Jun 12, 2024 · Cloud Computing

Emerging Disaggregated Compute‑Storage Architecture for Cloud and Internet Scenarios

The article examines challenges of traditional server‑based distributed storage in cloud and internet workloads and proposes a new disaggregated compute‑storage architecture leveraging emerging hardware such as EBOF, DPU, CXL, and NVMe to improve resource utilization, performance, reliability, and efficiency.

Data centerHardware trendscloud computing
0 likes · 13 min read
Emerging Disaggregated Compute‑Storage Architecture for Cloud and Internet Scenarios
Architects' Tech Alliance
Architects' Tech Alliance
Jun 11, 2024 · Industry Insights

Why Traditional Distributed Storage Struggles and How New Compute‑Storage Separation Can Transform Cloud Data Centers

The article analyzes the limitations of current server‑based distributed storage—such as data‑lifecycle mismatches, performance‑resource trade‑offs, serverless workload demands, and the costly "datacenter tax"—and presents emerging hardware trends and a novel compute‑storage separation architecture that promises higher efficiency, reliability, and scalability for cloud and internet data centers.

CXLCompute-Storage SeparationDPU
0 likes · 13 min read
Why Traditional Distributed Storage Struggles and How New Compute‑Storage Separation Can Transform Cloud Data Centers
Open Source Linux
Open Source Linux
Jun 3, 2024 · Cloud Computing

Design Principles and Future Trends of Data Center Networks

This article outlines key design principles—scalability, availability, flexibility, and security—for modern data center networks, compares fabric, overlay, spine‑leaf, and BGP EVPN architectures, and discusses emerging trends such as high‑bandwidth, heterogeneous compute clusters and intelligent, cost‑effective operations.

BGP EVPNData centerScalability
0 likes · 10 min read
Design Principles and Future Trends of Data Center Networks
Open Source Linux
Open Source Linux
May 17, 2024 · Cloud Computing

Why Hyper-Converged Infrastructure Is Revolutionizing Modern Data Centers

This article explains what hyper‑converged infrastructure (HCI) is, outlines its core components and key technologies such as virtualization, software‑defined storage and networking, and details the benefits, scalability, high‑availability features, and typical use cases for enterprises adopting HCI.

Data centerHCIHyper-Converged Infrastructure
0 likes · 24 min read
Why Hyper-Converged Infrastructure Is Revolutionizing Modern Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
May 13, 2024 · Operations

What Are the Core Principles Behind Modern Data Center Network Architecture?

This article outlines the fundamental design principles for data center networks—scalability, availability, flexibility, and security—and examines key architectures such as Fabric, Overlay, Spine‑Leaf, and BGP EVPN, while also highlighting emerging trends toward higher bandwidth, dense compute, cost efficiency, and intelligent operations.

BGP EVPNData centerOverlay
0 likes · 13 min read
What Are the Core Principles Behind Modern Data Center Network Architecture?
Architects' Tech Alliance
Architects' Tech Alliance
May 11, 2024 · Industry Insights

Why Network Interconnects Are the New Bottleneck for Large‑Model AI Training

The rapid growth of AI large‑model training and inference is driving unprecedented demand for compute and high‑speed networking, prompting a shift from traditional GPU clusters to super‑pooled intelligent computing centers that must balance multiple intra‑ and inter‑node interconnect solutions such as NVLink, OAM/UBB, InfiniBand and RoCEv2.

AIData centerInfiniBand
0 likes · 6 min read
Why Network Interconnects Are the New Bottleneck for Large‑Model AI Training
Open Source Linux
Open Source Linux
May 7, 2024 · Operations

Why Spine‑Leaf IP Fabric Beats Traditional Data Center Networks

This article compares traditional three‑tier data‑center networking with modern spine‑leaf IP Fabric architectures, highlighting differences in bandwidth, availability, scalability, security, convergence time, multi‑tenant support, ECMP routing, configuration complexity, automation, and cost to help engineers choose the optimal design.

BGP EVPNData centerScalability
0 likes · 14 min read
Why Spine‑Leaf IP Fabric Beats Traditional Data Center Networks
Architects' Tech Alliance
Architects' Tech Alliance
May 1, 2024 · Industry Insights

How CXL Can Break the AI Memory Wall and Boost Data‑Center Performance

The rapid growth of AI models is widening the gap between compute power and memory bandwidth, but the emerging Compute Express Link (CXL) interconnect offers lower latency, memory sharing, and flexible device topologies that can alleviate the memory‑wall bottleneck and reshape future data‑center architectures.

AI computeCXLData center
0 likes · 10 min read
How CXL Can Break the AI Memory Wall and Boost Data‑Center Performance
Bilibili Tech
Bilibili Tech
Apr 30, 2024 · Industry Insights

How Bilibili’s Smart Cabling Platform Boosts Data Center Efficiency

This article examines Bilibili's data‑center cabling challenges and presents a smart management platform that digitizes design, automates routing with scenario‑based and shortest‑path algorithms, streamlines task creation and operation, ultimately reducing installation time and improving maintenance efficiency.

AutomationCablingData center
0 likes · 12 min read
How Bilibili’s Smart Cabling Platform Boosts Data Center Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
Apr 24, 2024 · Industry Insights

What Is Hyper‑Converged Infrastructure and Why It’s Transforming Data Centers

Hyper‑converged infrastructure (HCI) integrates compute, storage, and networking into a single software‑defined platform, offering simplified management, improved efficiency, seamless scalability, and lower total cost of ownership, making it a preferred architecture for modern data centers and cloud‑native workloads.

Data centerHCIInfrastructure
0 likes · 28 min read
What Is Hyper‑Converged Infrastructure and Why It’s Transforming Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Apr 18, 2024 · Industry Insights

Why InfiniBand Dominates Modern HPC: Speed, Latency, and Scalability Explained

This article provides a comprehensive technical overview of InfiniBand, covering its rapid adoption in top supercomputers, detailed performance advantages such as ultra‑high bandwidth, CPU offload, sub‑microsecond latency, flexible scalability, QoS, SHARP acceleration, and a comparison with Ethernet, Fibre Channel, and Omni‑Path, while also outlining HDR switch and NIC product families.

Data centerHDRHPC
0 likes · 20 min read
Why InfiniBand Dominates Modern HPC: Speed, Latency, and Scalability Explained
Baidu Geek Talk
Baidu Geek Talk
Apr 17, 2024 · Industry Insights

Mastering Multi-CPU Performance: Challenges and One-Click Tuning with Btune

The talk outlines how modern data centers host diverse CPUs (Intel, AMD, Ampere, ARM), the multi‑layer performance‑tuning challenges this creates, and how Baidu Cloud’s one‑click Btune suite automates metric collection, bottleneck identification, and optimization across hardware, kernel, runtime, and application layers.

BtuneCPUData center
0 likes · 19 min read
Mastering Multi-CPU Performance: Challenges and One-Click Tuning with Btune
Open Source Linux
Open Source Linux
Mar 29, 2024 · Fundamentals

Why Spine‑Leaf IP Fabric Beats Traditional Data Center Networks

This article compares traditional three‑tier data‑center networking with modern leaf‑spine IP‑Fabric architectures, detailing bandwidth, availability, scalability, security, and the advantages of BGP EVPN and VXLAN in terms of scalability, convergence, multi‑tenant support, automation, and overall cost.

BGP EVPNData centerIP fabric
0 likes · 14 min read
Why Spine‑Leaf IP Fabric Beats Traditional Data Center Networks
Architects' Tech Alliance
Architects' Tech Alliance
Mar 25, 2024 · Industry Insights

Why Fat-Tree, Dragonfly, and Torus Topologies Matter in HPC Networks

The article examines the challenges of ultra‑large‑scale HPC networking, compares traditional CLOS with Fat‑Tree, Dragonfly, and Torus topologies, explains their bandwidth and latency characteristics, presents scalability formulas, and evaluates routing algorithms and practical trade‑offs for each design.

Data centerDragonflyHigh‑performance computing
0 likes · 14 min read
Why Fat-Tree, Dragonfly, and Torus Topologies Matter in HPC Networks
Architects' Tech Alliance
Architects' Tech Alliance
Mar 23, 2024 · Fundamentals

What Makes a Server Tick? A Deep Dive into Server Architecture and Components

This article provides a comprehensive overview of servers, covering their definition, various classification schemes (form factor, CPU architecture, scale, usage, and X86 vs non‑X86), and detailed breakdown of hardware and software components such as CPUs, memory, storage, I/O cards, and management modules.

CPU parametersData centerMemory Technology
0 likes · 15 min read
What Makes a Server Tick? A Deep Dive into Server Architecture and Components
dbaplus Community
dbaplus Community
Mar 19, 2024 · Big Data

How JD’s Mini‑Program Data Center Powers Real‑Time Analytics and Monitoring

JD’s Mini‑Program Data Center integrates data collection, storage, and real‑time analysis using Flink, ClickHouse, and Elasticsearch to provide comprehensive monitoring, user behavior insights, and scalable analytics for mini‑programs across JD’s ecosystem, enabling precise operations and future AI‑driven enhancements.

ClickHouseData centerElasticsearch
0 likes · 19 min read
How JD’s Mini‑Program Data Center Powers Real‑Time Analytics and Monitoring
Architects' Tech Alliance
Architects' Tech Alliance
Mar 17, 2024 · Industry Insights

Why Hyper‑Converged Infrastructure Beats Traditional VMware + FC SAN: 4 Key Differences

The article compares hyper‑converged infrastructure with the traditional VMware + FC SAN stack, highlighting four architectural differences and showing how hyper‑convergence improves reliability, concurrency performance, scalability, operational simplicity, and total cost of ownership for modern data‑center workloads.

CostData centerHyper-Converged
0 likes · 8 min read
Why Hyper‑Converged Infrastructure Beats Traditional VMware + FC SAN: 4 Key Differences
Architecture Digest
Architecture Digest
Feb 27, 2024 · Operations

How Large Is ByteDance’s Server Infrastructure and Bandwidth?

The article explains ByteDance’s massive data‑center fleet, detailing server counts from 2017 to 2020, estimating total outbound bandwidth in the multi‑terabit range, and describing how CDN, load‑balancing, and multi‑link designs enable billions of users to stream videos simultaneously.

ByteDanceData centerbandwidth
0 likes · 7 min read
How Large Is ByteDance’s Server Infrastructure and Bandwidth?
Architects' Tech Alliance
Architects' Tech Alliance
Feb 22, 2024 · Industry Insights

How DPU Technology is Transforming Cloud Data Centers: From NICs to SoC

From traditional NICs to smart NICs, FPGA‑based DPUs and single‑chip DPU SoCs, this article analyzes the evolution of network adapters, their hardware capabilities, design challenges, and real‑world deployments by cloud providers such as AWS, Nvidia, Intel, Alibaba Cloud and Volcano Engine.

DPUData centerNetwork Acceleration
0 likes · 16 min read
How DPU Technology is Transforming Cloud Data Centers: From NICs to SoC