Tagged articles

NPU

25 articles · Page 1 of 1

Jun 2, 2026 · Fundamentals

Decoding Chip Concepts: CPU, GPU, NPU, APU, SoC, HBM & Chiplet (2026)

This article breaks down the core chip concepts—CPU, GPU, NPU, APU, SoC, HBM and Chiplet—explaining their functions, key characteristics, historical evolution, and how they relate to each other, and provides a 2026 mainstream‑chip comparison and selection guide.

CPUChipletGPU

0 likes · 18 min read

Decoding Chip Concepts: CPU, GPU, NPU, APU, SoC, HBM & Chiplet (2026)

Architects' Tech Alliance

May 26, 2026 · Artificial Intelligence

Huawei Ascend 950 NPU Architecture Deep Dive – Full Whitepaper Inside

The article provides a detailed technical analysis of Huawei's Ascend 950 NPU series, covering its one‑chip dual‑structure for training and inference, SIMD/SIMT dual‑mode compute, ultra‑fine memory granularity, PD separation, native FP4 support, a high‑bandwidth 2.0 interconnect, and a fully self‑developed yet CUDA‑compatible ecosystem.

AI acceleratorAscend 950FP4

0 likes · 10 min read

Huawei Ascend 950 NPU Architecture Deep Dive – Full Whitepaper Inside

Tencent Technical Engineering

May 24, 2026 · Artificial Intelligence

How Tsinghua & Tencent Mixed‑X Won the MLSys 2026 MoE Inference Challenge with a 4.1× Speedup

The Tsinghua‑Tencent Mixed‑X team captured the MLSys 2026 MoE inference optimization championship by analyzing NPU bottlenecks, redesigning data movement, applying expert‑level sharding, continuous DMA, PSUM batching, and an Agent‑based optimizer, achieving a 4.1× end‑to‑end speedup while preserving bit‑level output fidelity.

Agent optimizerInference OptimizationMLSys 2026

0 likes · 14 min read

How Tsinghua & Tencent Mixed‑X Won the MLSys 2026 MoE Inference Challenge with a 4.1× Speedup

Architects' Tech Alliance

May 6, 2026 · Artificial Intelligence

Which AI Chip Leads the Pack? A Deep Dive into CPU, GPU, NPU, TPU, LPU, DPU, and VPU

The article breaks down the seven major AI‑focused processors—CPU, GPU, NPU, TPU, LPU, DPU, and VPU—explaining each one's architectural strengths, typical workloads, representative vendors, and trade‑offs, then summarizes which role each chip excels at in modern AI systems.

CPUDPUGPU

0 likes · 9 min read

Which AI Chip Leads the Pack? A Deep Dive into CPU, GPU, NPU, TPU, LPU, DPU, and VPU

Lao Guo's Learning Space

May 5, 2026 · Artificial Intelligence

AMD Ryzen AI MAX+ PRO 495 Review: The Most Powerful Mobile APU Yet

The AMD Ryzen AI MAX+ PRO 495 (code‑named Gorgon Halo) boosts memory bandwidth, expands unified memory to up to 256 GB, and delivers 55‑60 TOPS NPU performance, resulting in roughly 4 % multi‑core and 3 % single‑core gains over its predecessor while targeting demanding AI workloads on thin‑and‑light laptops.

AMDMobile APUNPU

0 likes · 9 min read

AMD Ryzen AI MAX+ PRO 495 Review: The Most Powerful Mobile APU Yet

Architects' Tech Alliance

Jan 16, 2026 · Artificial Intelligence

Why Do GPUs and NPUs Produce Different FP16 Results? Uncovering AI Chip Precision Secrets

Engineers training large AI models often see noticeable FP16/BF16 result differences between GPUs and NPUs, and even between generations of the same chip, due to floating‑point representation limits, hardware design choices, software library implementations, compiler optimizations, and parallel execution nondeterminism.

AIGPUHardware Design

0 likes · 10 min read

Why Do GPUs and NPUs Produce Different FP16 Results? Uncovering AI Chip Precision Secrets

Architects' Tech Alliance

Sep 13, 2025 · Artificial Intelligence

How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810

Huawei’s Da Vinci AI architecture, introduced with the Kirin 810 SoC, combines a 3D Cube matrix‑multiply engine, vector and scalar units, and flexible scaling to deliver high‑performance, energy‑efficient AI compute across devices from low‑power IoT to high‑end cloud servers.

3D CubeAIDa Vinci architecture

0 likes · 11 min read

How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810

Architects' Tech Alliance

Aug 23, 2025 · Artificial Intelligence

How Huawei’s Ascend Architecture Redefines AI Acceleration

This article examines Huawei's Ascend AI accelerator architecture, detailing its heterogeneous compute units, memory hierarchy, task scheduling, programming model, and chip variants, while also discussing future challenges and the ecosystem needed for widespread AI deployment.

AI acceleratorAI hardwareDaVinci architecture

0 likes · 14 min read

How Huawei’s Ascend Architecture Redefines AI Acceleration

Architects' Tech Alliance

Jul 3, 2025 · Artificial Intelligence

What Makes ASIC Chips the Powerhouse Behind AI? A Deep Dive

This article explains what ASIC chips are, how they differ from CPUs, GPUs and FPGAs, classifies them by customization level and function, outlines their performance and cost advantages, discusses their drawbacks, and reviews current products and market trends driving AI hardware adoption.

AI hardwareASICFPGA

0 likes · 11 min read

What Makes ASIC Chips the Powerhouse Behind AI? A Deep Dive

Architects' Tech Alliance

Mar 27, 2025 · Artificial Intelligence

What Makes AI Chips Different? A Deep Dive into Training and Inference Processors

This article explains the rise of AI‑specific processors, defines AI chips, compares their architectures, and examines the distinct requirements of training versus inference chips while outlining the main technology routes (GPU, FPGA, ASIC) and future outlook.

AI chipsASICDSA

0 likes · 9 min read

What Makes AI Chips Different? A Deep Dive into Training and Inference Processors

JD Tech

Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail

0 likes · 20 min read

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

JD Retail Technology

Mar 4, 2025 · Artificial Intelligence

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

JD Retail’s Nine‑Number Algorithm Platform delivers an end‑to‑end AI engine that unifies GPU and domestic NPU resources across a thousand‑card cluster, offering zero‑cost model migration, optimized training and inference pipelines, support for over 40 LLM and multimodal models, and proven business‑level performance that reduces dependence on overseas chips.

AIGPUModel Optimization

0 likes · 19 min read

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

Bilibili Tech

Mar 4, 2025 · Artificial Intelligence

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

The Bilibili TTV team optimized OpenSora and CogVideoX text‑to‑video models by redesigning data storage with Alluxio, parallelizing VAE encoding, applying dynamic sequence‑parallel and DeepSpeed‑Ulysses attention, adapting GPU code for NPU execution, leveraging profiling‑driven kernel fusion, FlashAttention, and expandable memory to dramatically increase training efficiency and frame throughput, while outlining future pipeline‑parallel and ZeRO‑3 scaling plans.

FlashAttentionNPUdata pipeline

0 likes · 26 min read

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

JD Tech Talk

Mar 3, 2025 · Artificial Intelligence

AI Engine Technology Based on Domestic Chips for JD Retail

This article describes JD Retail's AI engine built on domestic NPU chips, covering challenges, heterogeneous GPU‑NPU scheduling, high‑performance training and inference engines, extensive model support, real‑world deployment cases, and future plans for large‑scale chip clusters and ecosystem development.

AIGPUNPU

0 likes · 20 min read

AI Engine Technology Based on Domestic Chips for JD Retail

JD Cloud Developers

Mar 3, 2025 · Artificial Intelligence

How JD.com Leverages Domestic NPU Chips to Power Large‑Scale AI Models

This article details JD.com's challenges and solutions for deploying domestic NPU chips across heterogeneous GPU‑NPU clusters, covering architecture, scheduling, high‑performance training and inference engines, real‑world case studies, and future plans to scale AI workloads securely and efficiently.

AIJD.comNPU

0 likes · 19 min read

How JD.com Leverages Domestic NPU Chips to Power Large‑Scale AI Models

Infra Learning Club

Feb 6, 2025 · Artificial Intelligence

Getting Started with Huawei Ascend AI Accelerators

This guide walks through the fundamentals of Huawei Ascend NPU hardware, the CANN software stack, driver and firmware installation, Kubernetes integration via Docker runtime and device plugin, and a complete ResNet‑50 inference demo on Ascend 310P.

AI inferenceCANNDocker Runtime

0 likes · 12 min read

Getting Started with Huawei Ascend AI Accelerators

Architects' Tech Alliance

Nov 26, 2024 · Artificial Intelligence

Get Ready for a Shakeout in Edge NPUs

The article examines the rapid growth and increasing complexity of edge AI NPUs, discussing challenges in software and hardware acceleration, supply‑chain constraints, and the need for integrated engine solutions to sustain performance and power efficiency.

NPUedge AIsoftware complexity

0 likes · 9 min read

Architects' Tech Alliance

Oct 19, 2024 · Industry Insights

What Is an NPU and Why It’s Shaping the Future of AI PCs

The article explains what Neural Processing Units (NPUs) are, how they differ from CPUs and GPUs, their parallel architecture, the workloads they accelerate, their role in edge AI and AI‑enabled PCs, and why industry analysts expect NPU‑enabled devices to dominate the market by 2026.

AI PCAI acceleratorIndustry Trends

0 likes · 8 min read

What Is an NPU and Why It’s Shaping the Future of AI PCs

Architects' Tech Alliance

Jul 3, 2024 · Industry Insights

Why ARM Is Poised to Overtake x86 in the AI PC Era

The report analyzes the accelerating shift from x86 to ARM in AI‑enabled devices, covering architectural differences, market share dynamics, Apple’s successful ARM transition, Microsoft’s ARM ecosystem, Intel’s heterogeneous AI processors, rising memory demands, and future industry forecasts for 2024‑2027.

AI PCArmNPU

0 likes · 17 min read

Why ARM Is Poised to Overtake x86 in the AI PC Era

21CTO

May 29, 2024 · Artificial Intelligence

How AI PCs Are Redefining the Desktop: Inside Microsoft’s Copilot+ Vision

Microsoft’s vision of AI PCs, highlighted by the Copilot+ concept, details how integrated NPU hardware, local large‑language models, and the Windows Copilot Runtime enable on‑device AI inference, reducing data‑center load and offering developers a unified platform for building next‑generation AI applications.

AI PCCopilot+LLM

0 likes · 11 min read

How AI PCs Are Redefining the Desktop: Inside Microsoft’s Copilot+ Vision

IT Services Circle

Feb 1, 2024 · Fundamentals

The Rise of NPU and Integrated Memory in AI PCs and Intel's Lunar Lake Architecture

The article examines how CPUs, GPUs, and memory have long formed the core of PC hardware, discusses the emerging role of NPUs for AI processing, and describes Intel's Lunar Lake strategy of integrating memory with the processor to deliver faster, lower‑latency performance in upcoming AI‑focused PCs.

AI PCCPUGPU

0 likes · 5 min read

The Rise of NPU and Integrated Memory in AI PCs and Intel's Lunar Lake Architecture

Architects' Tech Alliance

Sep 4, 2023 · Artificial Intelligence

Overview of AI Chip Types, Architectures, and Market Trends

The article explains the various AI‑capable chips such as CPUs, GPUs, FPGAs, NPUs, and TPUs, compares their performance and efficiency, describes heterogeneous CPU+xPU solutions, and provides market share data while highlighting the growing adoption of specialized AI accelerators.

AI accelerationAI chipsCPU

0 likes · 7 min read

Overview of AI Chip Types, Architectures, and Market Trends

Alimama Tech

Dec 22, 2021 · Artificial Intelligence

Performance Optimization of Advertising Deep Learning Systems: Algorithm, System, and Hardware Co‑Design

The paper presents a holistic algorithm‑system‑hardware co‑design for advertising deep‑learning inference, combining model pruning, approximate computing, kernel fusion, scheduling and PCIe transfer optimizations with GPU and NPU upgrades, achieving up to five‑fold speed‑up and significantly higher latency‑bounded QPS for large‑scale ad services.

Algorithmic OptimizationGPUNPU

0 likes · 24 min read

Performance Optimization of Advertising Deep Learning Systems: Algorithm, System, and Hardware Co‑Design

Tencent Music Tech Team

Apr 30, 2020 · Mobile Development

Edge Deep Learning Inference on Mobile Devices: Challenges, Hardware Diversity, and Optimization Strategies

Edge deep learning inference on mobile devices faces hardware and software fragmentation, diverse CPUs, GPUs, DSPs, and NPUs, and limited programmability; optimization techniques such as model selection, quantization, and architecture‑specific tuning enable real‑time performance, with most inference on CPUs, GPUs offering 5–10× speedups, and co‑processor support varying across Android and iOS.

DSPGPU programmingNPU

0 likes · 17 min read

Edge Deep Learning Inference on Mobile Devices: Challenges, Hardware Diversity, and Optimization Strategies

Architects' Tech Alliance

Mar 28, 2020 · Artificial Intelligence

Heterogeneous Computing: Overview of CPU, GPU, FPGA, ASIC, and NPU

This article explains heterogeneous computing and compares major processing units—CPU, GPU, FPGA, ASIC, and NPU—highlighting their architectures, strengths, and typical use cases, especially in deep‑learning and AI workloads.

ASICCPUDeep Learning

0 likes · 10 min read

Heterogeneous Computing: Overview of CPU, GPU, FPGA, ASIC, and NPU