Artificial Intelligence 11 min read

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

The article outlines the accelerating demand for high‑performance computing driven by AI, AR/VR, biotech and other workloads, examines the limits of Moore's law, and presents emerging solutions such as advanced chip architectures, chiplet integration, near‑memory/in‑memory computing, and distributed xPU‑based systems for scalable, efficient compute.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

The rapid growth of AI models like ChatGPT, along with emerging high‑performance workloads in AR/VR, genomics, and biopharma, is pushing the demand for compute far beyond the pace of traditional Moore's law scaling.

While classic scaling by increasing transistor density (More Moore) faces rising cost and power challenges, alternative approaches include:

Continuing transistor density improvements with FinFET to GAA and nanosheet technologies (More Moore).

Exploring beyond‑CMOS materials such as carbon nanotubes and MoS₂ for new transistor mechanisms (Beyond CMOS).

Domain‑specific architectures (DSA) and chiplet‑based designs are highlighted as key strategies. DSA chips tailor memory and compute resources to specific applications, offering performance comparable to ASICs with greater flexibility. Chiplet technology modularizes large chips, enables heterogeneous process integration, reduces cost, and improves yield.

Three‑dimensional (3D) stacking further enhances integration density, helping to overcome memory‑wall limitations.

In‑memory computing (Processing Near Memory, Processing In Memory, Processing Within Memory) integrates compute engines directly with storage, reducing data movement, improving bandwidth, and achieving higher energy efficiency for matrix‑vector operations common in deep learning.

Distributed computing architectures based on peer‑to‑peer xPU nodes replace the traditional CPU‑centric model. Each xPU node contains heterogeneous compute resources (CPU, GPU, accelerators) and manages task scheduling internally, while low‑latency memory‑semantic interconnects (e.g., UCIe, Fabric) enable efficient data exchange without relying on the CPU as a bottleneck.

The convergence of compute and network (算网融合) requires IP‑based networking to coordinate compute and network resources, ensuring low‑latency, high‑throughput services such as cloud‑gaming and VR. Service‑aware networks (SAN) are proposed to jointly schedule compute and network resources, improving QoS and energy efficiency.

Overall, the article emphasizes that overcoming the compute wall will rely on a combination of advanced chip architectures, near‑memory/in‑memory computing, chiplet integration, and distributed xPU systems, all coordinated through intelligent networking technologies.

distributed computingAI accelerationcomputing powerchip architectureChipletnear-memory computing
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.