Tagged articles
43 articles
Page 1 of 1
Liangxu Linux
Liangxu Linux
May 10, 2026 · Fundamentals

SOC vs MCU in Embedded Devices: Key Differences Explained

The article compares SOC and MCU for embedded systems, using analogies, performance and power benchmarks, development ecosystem contrasts, and cost considerations to show how each fits different application requirements and why choosing the right one matters.

MCUSOCcost analysis
0 likes · 6 min read
SOC vs MCU in Embedded Devices: Key Differences Explained
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2026 · Industry Insights

Why CXL Is the Only Interconnect That Can Solve the Memory Wall, Resource Islands, and Cache Inconsistency

The article dissects how CXL emerged to address three fundamental data‑center bottlenecks—memory wall, resource islands, and cache‑incoherence—traces its technical evolution, compares the divergent strategies of Intel, AMD, Nvidia, Google, Alibaba Cloud, and Huawei, and evaluates CXL’s challenges, opportunities, and future ecosystem.

AI hardwareCXLData center
0 likes · 29 min read
Why CXL Is the Only Interconnect That Can Solve the Memory Wall, Resource Islands, and Cache Inconsistency
AI Frontier Lectures
AI Frontier Lectures
Jan 12, 2026 · Industry Insights

Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions

The article analyses the challenges of large‑language‑model inference, highlighting memory bandwidth and interconnect as the primary bottlenecks, and presents four research opportunities—high‑bandwidth flash, processing‑near‑memory, 3D memory‑logic stacking, and low‑latency interconnect—while evaluating current Nvidia solutions and proposing integrated architectural approaches.

3D stackingAI hardware researchLLM inference
0 likes · 22 min read
Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions
Architects' Tech Alliance
Architects' Tech Alliance
Jul 9, 2025 · Fundamentals

How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance

HBM5 introduces a 3D near‑memory computing architecture that vertically stacks DRAM dies and integrates compute units within the memory stack, dramatically boosting bandwidth, reducing data‑movement power, and delivering significant performance and energy‑efficiency gains for AI, high‑performance computing, and data‑center workloads.

AI accelerationHBM5Near-Memory Computing
0 likes · 8 min read
How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance
Tencent Cloud Developer
Tencent Cloud Developer
Jul 8, 2025 · Artificial Intelligence

How GPUs Power AI: From Graphics to GPGPU Explained

This article explores how GPUs evolved from graphics accelerators to general‑purpose processors for AI, detailing the CPU‑GPU heterogeneous architecture, the CUDA programming workflow, compilation into fat binaries, kernel launch mechanics, hardware components, and the differences between SIMD and SIMT models, with performance comparisons and code examples.

AICUDAGPGPU
0 likes · 31 min read
How GPUs Power AI: From Graphics to GPGPU Explained
AI Cyberspace
AI Cyberspace
May 20, 2025 · Artificial Intelligence

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

This article explains the scaling laws behind large language models, the explosive growth of model sizes and compute demands, and why modern AI infrastructure must adopt SuperNode and SuperPOD architectures that combine high‑bandwidth Scale‑Up networks with flexible Scale‑Out networking to overcome bandwidth, latency, and power challenges.

AI scalingDistributed TrainingSuperPoD
0 likes · 42 min read
Why SuperNode and SuperPOD Are Critical for Scaling AI Models
Architects' Tech Alliance
Architects' Tech Alliance
Apr 23, 2025 · Artificial Intelligence

What Makes Huawei’s Ascend 920 AI Chip a Game-Changer? Deep Technical Breakdown

An in‑depth analysis of Huawei’s third‑generation Ascend 920 AI processor reveals its 6 nm process, 64 Da Vinci cores, advanced Cube Unit matrix engine, HBM‑PIM memory‑compute integration, high‑speed interconnects, performance benchmarks versus Nvidia H20, and the challenges and future directions for AI hardware.

AI chip industryAI processorHuawei Ascend 920
0 likes · 11 min read
What Makes Huawei’s Ascend 920 AI Chip a Game-Changer? Deep Technical Breakdown
Architects' Tech Alliance
Architects' Tech Alliance
Mar 30, 2025 · Industry Insights

Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips

The article analyzes the rapid growth of AI model memory and compute demands, the slow increase of chip memory capacity, and argues that memory bandwidth and energy consumption, rather than raw compute, will dominate AI chip design, emphasizing multi‑tenancy, DSA flexibility, and data‑flow optimization.

AI chipsDSAMemory Bandwidth
0 likes · 7 min read
Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips
Cognitive Technology Team
Cognitive Technology Team
Mar 25, 2025 · Fundamentals

Understanding the Java Memory Model and Its Interaction with Hardware Memory Architecture

This article explains how the Java Memory Model defines the interaction between threads, thread stacks, and the heap, illustrates these concepts with diagrams and example code, and discusses how modern hardware memory architecture, caches, and CPU registers affect visibility and race conditions in concurrent Java programs.

HeapJavaMemory Model
0 likes · 11 min read
Understanding the Java Memory Model and Its Interaction with Hardware Memory Architecture
Tencent Technical Engineering
Tencent Technical Engineering
Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming
0 likes · 42 min read
Fundamentals of GPU Architecture and Programming
Baidu Geek Talk
Baidu Geek Talk
Mar 5, 2025 · Cloud Computing

Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets

This article provides a comprehensive technical overview of GPU cloud server design, covering data‑processing pipelines, hardware topology, NUMA considerations, PCIe and proprietary interconnects, multi‑GPU communication strategies, virtualization approaches (BCC and BBC), DPU acceleration, and future trends for scaling up and out.

GPUPerformance OptimizationVirtualization
0 likes · 27 min read
Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets
Python Programming Learning Circle
Python Programming Learning Circle
Jan 6, 2025 · Fundamentals

Beyond Moore's Law: Software, Algorithms, and Architecture as New Performance Drivers

The article examines how, as Moore's Law ends, performance gains will increasingly rely on software optimization, algorithmic advances, and hardware architecture innovations, illustrated by matrix multiplication benchmarks and discussions of Dennard scaling, parallelism, and emerging technologies.

Moore's Lawhardware architectureperformance engineering
0 likes · 10 min read
Beyond Moore's Law: Software, Algorithms, and Architecture as New Performance Drivers
Architects' Tech Alliance
Architects' Tech Alliance
Jan 6, 2025 · Industry Insights

How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains

The article provides a detailed technical analysis of Nvidia’s new GB300 and B300 GPUs, comparing their performance, memory architecture, and power consumption to previous generations, and examines how these changes affect AI inference workloads, NVL72 accelerator systems, and the supply‑chain strategies of major cloud providers.

AI inferenceGPUNvidia
0 likes · 12 min read
How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains
Architects' Tech Alliance
Architects' Tech Alliance
Oct 19, 2024 · Industry Insights

What Is an NPU and Why It’s Shaping the Future of AI PCs

The article explains what Neural Processing Units (NPUs) are, how they differ from CPUs and GPUs, their parallel architecture, the workloads they accelerate, their role in edge AI and AI‑enabled PCs, and why industry analysts expect NPU‑enabled devices to dominate the market by 2026.

AI PCAI acceleratorEdge Computing
0 likes · 8 min read
What Is an NPU and Why It’s Shaping the Future of AI PCs
Python Programming Learning Circle
Python Programming Learning Circle
May 28, 2024 · Fundamentals

Beyond Moore's Law: Leveraging Software, Algorithms, and Architecture for Future Performance Gains

With Moore's Law reaching its limits, a recent Science paper by MIT, Nvidia, and Microsoft researchers argues that future computing performance will rely on improvements in the software stack, algorithmic innovations, and hardware architecture, as demonstrated by performance engineering benchmarks and evolving hardware trends.

AlgorithmsMoore's LawPost-Moore Era
0 likes · 9 min read
Beyond Moore's Law: Leveraging Software, Algorithms, and Architecture for Future Performance Gains
Architects' Tech Alliance
Architects' Tech Alliance
May 1, 2024 · Industry Insights

How CXL Can Break the AI Memory Wall and Boost Data‑Center Performance

The rapid growth of AI models is widening the gap between compute power and memory bandwidth, but the emerging Compute Express Link (CXL) interconnect offers lower latency, memory sharing, and flexible device topologies that can alleviate the memory‑wall bottleneck and reshape future data‑center architectures.

AI computeCXLData center
0 likes · 10 min read
How CXL Can Break the AI Memory Wall and Boost Data‑Center Performance
Architects' Tech Alliance
Architects' Tech Alliance
Mar 18, 2024 · Industry Insights

Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects

The article provides an in‑depth technical analysis of Nvidia’s NVLink C2C interconnect, comparing its latency, bandwidth, power efficiency, density and cost against traditional SerDes solutions and examining its role in building SuperChip architectures with Grace CPUs and Hopper GPUs.

GPUNVLinkcost analysis
0 likes · 12 min read
Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Feb 22, 2024 · Industry Insights

How DPU Technology is Transforming Cloud Data Centers: From NICs to SoC

From traditional NICs to smart NICs, FPGA‑based DPUs and single‑chip DPU SoCs, this article analyzes the evolution of network adapters, their hardware capabilities, design challenges, and real‑world deployments by cloud providers such as AWS, Nvidia, Intel, Alibaba Cloud and Volcano Engine.

DPUData centerNetwork Acceleration
0 likes · 16 min read
How DPU Technology is Transforming Cloud Data Centers: From NICs to SoC
Architects' Tech Alliance
Architects' Tech Alliance
Jan 7, 2024 · Industry Insights

Why Integrated Chiplet Architecture Is Shaping the Future of Semiconductors

The article explains the concept of integrated chips and chiplets, describes their architecture, the role of silicon interposers, outlines three main performance‑boosting pathways—scaling, new device materials, and chiplet integration— and highlights recent industry examples and standards that illustrate the emerging paradigm.

ChipletIntegrated Chiphardware architecture
0 likes · 13 min read
Why Integrated Chiplet Architecture Is Shaping the Future of Semiconductors
Architects' Tech Alliance
Architects' Tech Alliance
Sep 11, 2023 · Artificial Intelligence

Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design

The 2023 Open Acceleration Specification AI Server Design Guide details the hardware architecture, OAM module and UBB board specifications, cooling, management, fault diagnosis, and software platform needed to build high‑performance, scalable AI compute clusters for large‑model training.

AI accelerationOAMUBB board
0 likes · 10 min read
Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design
Architects' Tech Alliance
Architects' Tech Alliance
Jul 29, 2023 · Artificial Intelligence

AI Server Market Overview and Technical Architecture

The article provides a comprehensive analysis of the AI server market, detailing server hardware components, cost distribution, logical architecture, firmware, rapid market growth, competitive landscape, AI-driven heterogeneous computing, and future industry trends, while highlighting key vendors and deployment configurations.

AI serversCloud providersGPU
0 likes · 10 min read
AI Server Market Overview and Technical Architecture
Liangxu Linux
Liangxu Linux
Jul 5, 2023 · Fundamentals

Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent

Modern CPUs use multi‑level cache memory to bridge the speed gap with main memory, relying on temporal and spatial locality, and employ the MESI protocol with states M, E, S, I to maintain coherence across cores, while techniques like store buffers and memory barriers mitigate latency and ordering issues.

CPUCache MemoryMESI
0 likes · 15 min read
Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent
Baidu Tech Salon
Baidu Tech Salon
Jul 4, 2022 · Artificial Intelligence

Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview

Kunlun Chip’s XPU‑R architecture combines high‑performance SDNN and Cluster compute units, 512 GB/s GDDR6 memory, and PCIe 4.0 interconnect, supported by an LLVM‑based software stack, CUDA‑like programming model, and seamless PaddlePaddle integration, enabling efficient AI training and inference with significant cost and performance gains.

AI ChipPaddlePaddleProgramming Model
0 likes · 16 min read
Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview
Architects' Tech Alliance
Architects' Tech Alliance
Aug 4, 2021 · Cloud Computing

Edge Computing Hardware Architecture and Emerging Trends

The article examines edge computing hardware architecture, discussing diverse use cases, evolving server and processor trends—including ARM, Intel, Nvidia, AMD, FPGA, and DPU—open hardware standards, reliability, virtual networking, and storage innovations, highlighting how these developments shape the future of cloud and edge infrastructures.

ARMDPUEdge Computing
0 likes · 16 min read
Edge Computing Hardware Architecture and Emerging Trends
Liangxu Linux
Liangxu Linux
Nov 4, 2020 · Fundamentals

How Much Faster Is CPU L1 Cache Compared to RAM, SSD, and HDD?

This article explains the storage hierarchy from CPU registers and caches to RAM, SSD, and HDD, quantifies their speed differences (L1 cache vs. memory, SSD, HDD) and cost ratios, and provides Linux commands to inspect cache sizes, helping readers understand why each level exists and how they interact.

CPU cacheHDDSSD
0 likes · 14 min read
How Much Faster Is CPU L1 Cache Compared to RAM, SSD, and HDD?
Architects' Tech Alliance
Architects' Tech Alliance
Oct 23, 2020 · Industry Insights

What Makes a SmartNIC Different from Traditional NICs? A Deep Dive into Leading Products

The article defines SmartNICs, outlines their key capabilities such as off‑loading processing to programmable hardware, compares major vendor implementations—including Broadcom, Nvidia/Mellanox, Intel, Xilinx, Netronome, and Pensando—and discusses market trends that position SmartNICs as the next wave of FPGA‑based acceleration for data‑center workloads.

Data centerFPGAIndustry analysis
0 likes · 14 min read
What Makes a SmartNIC Different from Traditional NICs? A Deep Dive into Leading Products
Qunar Tech Salon
Qunar Tech Salon
Mar 31, 2015 · Fundamentals

Understanding CPU Caches, Coherency Protocols, and Memory Models

This article provides a concise introduction to CPU cache architecture, explains read/write policies, describes cache coherency protocols such as MESI and its variants, and discusses how different memory models affect multi‑core consistency and performance.

CPU cacheCache CoherencyMESI Protocol
0 likes · 19 min read
Understanding CPU Caches, Coherency Protocols, and Memory Models