Tagged articles

536 articles

Page 3 of 6

Architects' Tech Alliance

Nov 25, 2024 · Industry Insights

What Makes HPE Cray’s New EX Supercomputers a Game‑Changer for AI and HPC?

The article provides an in‑depth analysis of HPE’s latest Cray EX supercomputing platforms, detailing their GPU density, performance benchmarks, liquid‑cooling architecture, Slingshot 400 interconnect, upcoming storage solutions, and alternative ProLiant Compute XD servers for AI workloads.

AICrayGPU

0 likes · 12 min read

What Makes HPE Cray’s New EX Supercomputers a Game‑Changer for AI and HPC?

Baidu Tech Salon

Nov 22, 2024 · Artificial Intelligence

How GPU‑Accelerated ANN Search Cuts Costs and Boosts Throughput in High‑Volume Retrieval

This article analyzes a GPU‑based approximate nearest neighbor (ANN) retrieval solution built on NVIDIA's RAFT library, detailing algorithm selection, offline indexing tricks, batch online search design, performance results on a 25‑million‑vector workload, and cost‑saving implications for large‑scale search services.

ANNGPUIVF_INT8

0 likes · 21 min read

How GPU‑Accelerated ANN Search Cuts Costs and Boosts Throughput in High‑Volume Retrieval

Architects' Tech Alliance

Nov 21, 2024 · Fundamentals

El Capitan Supercomputer and the Rise of AMD GPU‑Driven HPC: Architecture, Performance, and Market Impact

The article examines the El Capitan supercomputer unveiled at SC24, detailing its AMD CPU‑GPU hybrid architecture, benchmark results, its dominance in the November 2024 Top500 list, and the broader implications for high‑performance computing, AI workloads, and the competitive landscape between AMD and NVIDIA.

AIAMDCPU

0 likes · 20 min read

El Capitan Supercomputer and the Rise of AMD GPU‑Driven HPC: Architecture, Performance, and Market Impact

Baidu Geek Talk

Nov 20, 2024 · Artificial Intelligence

Boosting ANN Search with GPU: Inside RAFT’s IVF_INT8 Implementation

This article examines how Baidu and NVIDIA leveraged the open‑source RAFT library to build a GPU‑accelerated approximate nearest neighbor (ANN) retrieval system, detailing algorithm choices, offline indexing, online batch processing, performance results, and practical guidelines for deploying ANN on GPUs.

ANNGPUIVF_INT8

0 likes · 20 min read

Boosting ANN Search with GPU: Inside RAFT’s IVF_INT8 Implementation

Nov 16, 2024 · Information Security

WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025

Ant Group’s Computing Systems Lab announced that its GPU‑accelerated fully homomorphic encryption framework WarpDrive, which exploits Tensor and CUDA cores for high‑throughput NTT operations and parallel kernel designs, has been accepted as a paper at the IEEE HPCA 2025 conference.

CUDAFully Homomorphic EncryptionGPU

0 likes · 4 min read

WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025

Alibaba Cloud Infrastructure

Nov 13, 2024 · Industry Insights

Why GPU Scale‑Up Interconnects Need a New Protocol – Inside UALink and Alibaba’s Alink

The article analyzes the growing demand for high‑bandwidth, low‑latency GPU Scale‑Up interconnects in AI clusters, explains why existing Ethernet and RDMA solutions fall short, and examines the industry‑wide UALink alliance and Alibaba's Alink System as a new open‑ecosystem solution.

AI InfrastructureAlink SystemGPU

0 likes · 12 min read

Why GPU Scale‑Up Interconnects Need a New Protocol – Inside UALink and Alibaba’s Alink

Xiaohongshu Tech REDtech

Nov 7, 2024 · Artificial Intelligence

RTAMS-GANNS: A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighbor Search

RTAMS‑GANNS, the award‑winning real‑time adaptive multi‑stream GPU system for online approximate nearest neighbor search, eliminates costly memory allocations and serial execution by using a dynamic memory‑block insertion algorithm and separate CUDA streams, cutting latency by 40‑80% and reliably serving over 100 million daily users in production.

GPUPerformance EvaluationVector Insertion

0 likes · 19 min read

RTAMS-GANNS: A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighbor Search

Linux Kernel Journey

Nov 5, 2024 · Artificial Intelligence

Understanding AI Flame Graphs: Insights from Brendan Gregg

The article introduces Intel's AI Flame Graph, a low‑overhead profiling tool that visualizes AI accelerator and GPU workloads across the full software stack, explains its design, demonstrates SYCL matrix‑multiply benchmarks, discusses challenges of AI instruction analysis, and outlines future adoption and impact.

AI profilingGPUIntel

0 likes · 16 min read

Understanding AI Flame Graphs: Insights from Brendan Gregg

Linux Code Review Hub

Nov 2, 2024 · Artificial Intelligence

Inside Intel’s AI Flame Graph: Low‑Overhead Profiling for Faster, Greener AI

The article introduces Intel’s AI Flame Graph, a low‑overhead profiling tool that visualizes AI accelerator and GPU execution alongside the full software stack, explains its design, shows SYCL matrix‑multiply examples, discusses challenges of AI workload analysis, and outlines future adoption and impact on performance and energy savings.

AI profilingGPUIntel

0 likes · 16 min read

Inside Intel’s AI Flame Graph: Low‑Overhead Profiling for Faster, Greener AI

Architecture and Beyond

Nov 2, 2024 · Artificial Intelligence

Step-by-Step Guide to Training a LoRA Model with Flux1_dev on ComfyUI

This tutorial walks programmers through preparing a GPU cloud environment, installing ComfyUI, downloading Flux1_dev models, integrating a custom LoRA, labeling generated images, and finally training the LoRA using ai‑toolkit, providing detailed commands, configuration tips, and practical cost estimates.

AI image generationComfyUIFlux

0 likes · 12 min read

Step-by-Step Guide to Training a LoRA Model with Flux1_dev on ComfyUI

Architects' Tech Alliance

Oct 26, 2024 · Industry Insights

Why NVIDIA’s Blackwell GB200 Outpaces H100: 5 Key Technical Advantages

The Blackwell GB200 series delivers a massive leap in AI compute power with 20 petaFLOPS FP4 performance, a dual‑chip N4P design, 192 GB HBM3E memory, modular MGX servers, and advanced copper DAC and liquid‑cooling solutions that together boost training speed up to 30‑fold over the H100.

BlackwellGB200GPU

0 likes · 6 min read

Why NVIDIA’s Blackwell GB200 Outpaces H100: 5 Key Technical Advantages

System Architect Go

Oct 20, 2024 · Cloud Native

Kubernetes GPU Scheduling: Device Plugin, CDI, NFD, and GPU Operator Overview

This article explains how Kubernetes manages and schedules GPU resources by introducing the Device Plugin framework, the Container Device Interface (CDI), Node Feature Discovery (NFD), and the GPU Operator, detailing their workflows, APIs, and practical usage with NVIDIA GPUs.

CDIDevice PluginGPU

0 likes · 9 min read

Kubernetes GPU Scheduling: Device Plugin, CDI, NFD, and GPU Operator Overview

Architects' Tech Alliance

Oct 16, 2024 · Fundamentals

Unveiling GPU Architecture: From Compute Units to Rendering Pipelines

This article provides a comprehensive technical overview of modern GPU architecture, covering memory hierarchy, compute units, shader execution, rendering pipelines, and performance‑optimisation techniques such as unified shaders, SIMT, warp scheduling, and tile‑based rendering strategies.

GPUMemoryRendering

0 likes · 32 min read

Unveiling GPU Architecture: From Compute Units to Rendering Pipelines

Oct 16, 2024 · Frontend Development

How Kola2d’s WebGL Engine Achieves 50+ FPS for Million‑Cell Spreadsheets

This article details the design and optimization of Kola2d, a custom WebGL rendering engine for Docs online spreadsheets, explaining why WebGL was chosen, how the system separates business and rendering layers, and the many performance tricks that enable smooth 50+ FPS rendering of tables with up to a million cells.

GPUKola2dOnline Spreadsheet

0 likes · 19 min read

How Kola2d’s WebGL Engine Achieves 50+ FPS for Million‑Cell Spreadsheets

Architects' Tech Alliance

Oct 15, 2024 · Artificial Intelligence

What Are the Core Metrics Behind AI Chips? A Deep Dive into GPU, ASIC, and TPU

This article explains the fundamental performance indicators of AI chips—TOPS, TFLOPS, and precision formats like FP16, FP32, and INT8—while comparing GPU, ASIC, and TPU architectures, highlighting Tensor Core advantages and TPU's superior efficiency over CPUs and GPUs.

AI ChipASICFP16

0 likes · 4 min read

What Are the Core Metrics Behind AI Chips? A Deep Dive into GPU, ASIC, and TPU

Oct 15, 2024 · Artificial Intelligence

Why Mojo Could Redefine AI Programming: Insights from Chris Lattner

The article explores Chris Lattner’s vision for Mojo—a Python‑compatible language designed for AI, GPU, and accelerator workloads—detailing its performance claims, SIMD support, complex‑number handling, and the growing developer community behind it.

AIGPUMojo

0 likes · 9 min read

Why Mojo Could Redefine AI Programming: Insights from Chris Lattner

Architects' Tech Alliance

Oct 7, 2024 · Industry Insights

What AMD Unveiled at Computex 2024: Zen 5, XDNA NPU, Ryzen 9000 and AI‑Focused Innovations

At Computex 2024, AMD showcased its latest CPU, GPU, and AI‑accelerated technologies—including the high‑performance Zen 5 core, second‑generation XDNA NPU with 50 TOPS, the Ryzen 9000 consumer processor, the AI‑PC Strix Point platform, Versal AI Edge Gen 2, the upcoming MI‑series AI GPUs, and the new UA‑Link interconnect—highlighting the company’s roadmap for next‑generation computing and AI workloads.

AIAMDCPU

0 likes · 5 min read

What AMD Unveiled at Computex 2024: Zen 5, XDNA NPU, Ryzen 9000 and AI‑Focused Innovations

Java Tech Enthusiast

Sep 30, 2024 · Artificial Intelligence

The AI Smile Curve: Profit Distribution and Future Outlook

The AI industry’s profit landscape mirrors a smile curve, with upstream GPU manufacturers and downstream application developers capturing most returns while costly large‑model R&D yields low margins, prompting predictions of GPU valuation corrections, a push for consumer‑facing killer apps, and massive application turnover through creative destruction.

AIGPUIndustry analysis

0 likes · 11 min read

The AI Smile Curve: Profit Distribution and Future Outlook

Architects' Tech Alliance

Sep 29, 2024 · Industry Insights

Why Super‑Heterogeneous Computing Is the Next Frontier in Computing Architecture

The article analyzes the limits of the von Neumann model and Moore's law, explains how instruction set complexity defines processor categories, and argues that integrating CPUs, GPUs, FPGAs, DPUs and ASICs into a super‑heterogeneous ecosystem—driven by Intel, NVIDIA, ARM and emerging trends—will shape the future of computing through diverse workloads, AI demand, green efficiency and a global compute network by 2030.

AIARMCPU

0 likes · 12 min read

Why Super‑Heterogeneous Computing Is the Next Frontier in Computing Architecture

Architects' Tech Alliance

Sep 25, 2024 · Fundamentals

NVIDIA Quantum‑2 InfiniBand Platform: Technical Overview, Q&A, and Deployment Guidance

This article explains the growing demand for high‑performance computing, introduces NVIDIA's Quantum‑2 InfiniBand platform with its high‑speed, low‑latency capabilities, provides a curated list of related technical articles, and offers an extensive Q&A covering compatibility, cabling, UFM, PCIe limits, and best‑practice deployment for AI and HPC workloads.

AIGPUInfiniBand

0 likes · 11 min read

NVIDIA Quantum‑2 InfiniBand Platform: Technical Overview, Q&A, and Deployment Guidance

Huawei Cloud Developer Alliance

Sep 18, 2024 · Artificial Intelligence

How Distributed Training Powers Massive Language Models: Concepts, Strategies, and Code

This article explains why single‑machine resources are insufficient for training ever‑larger language models, introduces the fundamentals of distributed training systems, details various parallel strategies such as data, model, pipeline, and hybrid parallelism, and provides practical PyTorch code and memory‑optimization techniques to accelerate large‑scale model training.

Deep LearningGPUParallelism

0 likes · 29 min read

How Distributed Training Powers Massive Language Models: Concepts, Strategies, and Code

Infra Learning Club

Sep 16, 2024 · Cloud Native

Survey of GPU Sharing and Virtualization Solutions for Kubernetes

The article surveys open‑source GPU sharing and virtualization approaches for AI workloads, comparing soft isolation, CUDA‑level isolation, NVIDIA MPS, driver‑level isolation, GPU pooling and deep‑learning memory sharing, and highlights their architectures, isolation guarantees, and performance trade‑offs.

Device PluginGPUKubernetes

0 likes · 5 min read

Survey of GPU Sharing and Virtualization Solutions for Kubernetes

Architects' Tech Alliance

Sep 8, 2024 · Industry Insights

How Nvidia’s Rapid GPU Cycle Is Shaping the Future of AI Super‑Scale Networking

The article analyzes Nvidia’s accelerated GPU rollout, highlighting the Blackwell series’ massive performance and energy gains, the company’s AI‑focused Ethernet Spectrum‑X roadmap, and the broader impact on NVLink, InfiniBand, and Ethernet interconnects for upcoming massive AI clusters.

AI EthernetGPUNvidia

0 likes · 6 min read

How Nvidia’s Rapid GPU Cycle Is Shaping the Future of AI Super‑Scale Networking

Architects' Tech Alliance

Aug 29, 2024 · Industry Insights

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

The article analyzes NVIDIA's DGX SuperPOD architectures across three GPU generations—H100, GH200, and GB200—detailing their NVLink/NVSwitch topologies, bandwidth calculations, scalability limits, and the practical challenges of constructing 256‑GPU and 576‑GPU supercomputing clusters.

Data centerGPUHigh‑performance computing

0 likes · 11 min read

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

Architects' Tech Alliance

Aug 25, 2024 · Industry Insights

Why GPUs May Lose the AI Race: TPU, FPGA, and Future Hardware Trends

While GPUs have driven AI acceleration for years, this article analyzes their architectural constraints, compares emerging alternatives such as Google's TPU and high‑end FPGAs, and explores future application niches like VR/AR, cloud gaming, and military systems where GPUs may still thrive or be replaced.

AI hardwareDeep LearningFPGA

0 likes · 15 min read

Why GPUs May Lose the AI Race: TPU, FPGA, and Future Hardware Trends

OPPO Kernel Craftsman

Aug 23, 2024 · Mobile Development

GPU Command and Syncpoint Analysis on SM8650 Platform

On the SM8650 platform, GLES issues synchronous and draw commands that the kernel‑mode driver translates into kgsl_drawobj structures, queues them in per‑context dispatch lists, processes fence, timestamp, and timeline syncpoints via dedicated kernel threads, and finally submits draw objects to the GPU firmware, with eglSwapBuffers triggering a fence syncpoint, a draw command, and a GPU fence creation.

AndroidGPUGraphics

0 likes · 12 min read

GPU Command and Syncpoint Analysis on SM8650 Platform

MaGe Linux Operations

Aug 23, 2024 · Fundamentals

What PC Specs Do You Need to Run Black Myth: Wu Kong Smoothly?

This article breaks down the official and tested PC hardware requirements for Black Myth: Wu Kong, covering CPU, GPU, RAM, and ray‑tracing needs across 1080p, 2K, and 4K resolutions, and offers practical build recommendations for each performance tier.

Black MythCPUGPU

0 likes · 13 min read

What PC Specs Do You Need to Run Black Myth: Wu Kong Smoothly?

Architects' Tech Alliance

Aug 21, 2024 · Fundamentals

Inside NVIDIA’s Stream Multiprocessor: How GPUs Execute Parallel Workloads

This article provides a detailed technical overview of the Stream Multi‑processor (SM) in modern GPUs, explaining its micro‑architecture, instruction fetch‑decode pipeline, warp scheduling, SIMT stack handling, scoreboard mechanisms, and strategies for hiding memory latency to maximize parallel execution efficiency.

GPUSIMTScoreboard

0 likes · 17 min read

Inside NVIDIA’s Stream Multiprocessor: How GPUs Execute Parallel Workloads

Baidu Geek Talk

Aug 19, 2024 · Artificial Intelligence

PaddlePaddle Neural Network Compiler (CINN): Architecture, Optimization Techniques, and Performance Gains

The PaddlePaddle Neural Network Compiler (CINN) combines a PIR‑based frontend that performs graph‑level optimizations such as constant folding, dead‑code elimination and operator fusion with a backend that applies schedule transformations and auto‑tuning, delivering up to 4× faster RMSNorm kernels and 30‑60% overall speed‑ups for generative AI and scientific‑computing workloads.

CINNDeep LearningGPU

0 likes · 18 min read

PaddlePaddle Neural Network Compiler (CINN): Architecture, Optimization Techniques, and Performance Gains

ByteDance Cloud Native

Aug 12, 2024 · Cloud Native

How to Deploy NVIDIA NIM AI Models on Volcengine VKE in Minutes

This guide walks you through deploying large language models with NVIDIA NIM on Volcengine's Kubernetes Engine (VKE), covering environment setup, model optimization, Helm chart deployment, monitoring integration, and the key advantages of using NIM as a cloud‑native AI micro‑service.

AI deploymentGPUKubernetes

0 likes · 12 min read

How to Deploy NVIDIA NIM AI Models on Volcengine VKE in Minutes

Python Programming Learning Circle

Aug 9, 2024 · Big Data

Introduction to cuDF: GPU‑Accelerated DataFrames and Dask Integration

This article introduces cuDF, a Python GPU DataFrame library with a pandas‑like API, compares it to pandas, explains when to use cuDF versus Dask‑cuDF for single‑GPU or multi‑GPU workloads, and provides practical code examples for common data operations.

Big DataDataFramesGPU

0 likes · 7 min read

Introduction to cuDF: GPU‑Accelerated DataFrames and Dask Integration

Architects' Tech Alliance

Aug 8, 2024 · Artificial Intelligence

Fundamental Key Parameters of AI Chips: Compute Power, Precision Formats, and Architecture

This article explains the essential metrics of AI chips—including TOPS and TFLOPS compute, precision formats like FP16, FP32 and INT8, and the roles of GPUs, ASICs and TPUs—while highlighting how Tensor Cores boost deep‑learning performance and comparing TPU efficiency to CPUs and GPUs.

AI chipsASICFP16

0 likes · 4 min read

Fundamental Key Parameters of AI Chips: Compute Power, Precision Formats, and Architecture

Architects' Tech Alliance

Aug 5, 2024 · Industry Insights

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

This article examines the AI compute chip ecosystem, covering GPU, FPGA, and ASIC technologies, market share trends, key performance metrics such as TOPS, power and die area, and provides a detailed overview of major global and Chinese vendors and their flagship products.

AI computeASICChinese AI chips

0 likes · 12 min read

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

Architects' Tech Alliance

Jul 25, 2024 · Artificial Intelligence

NVIDIA H20 AI Chip Launch and the Rapid Growth of China's AI Chip Market

The article reviews NVIDIA's newly released H20 AI accelerator for China, compares its performance and pricing with domestic chips, outlines the expanding Chinese AI chip ecosystem—including Huawei, Cambricon, HaiGuang, Alibaba, ByteDance, and Baidu—while highlighting market size growth, multi‑chip heterogeneity strategies, and the strong demand forecast through 2024.

AI chipsAI computeChina

0 likes · 8 min read

NVIDIA H20 AI Chip Launch and the Rapid Growth of China's AI Chip Market

Architects' Tech Alliance

Jul 23, 2024 · Industry Insights

Inside Los Alamos’ Venado Supercomputer: Architecture, Performance, and HPC Trends

The Venado supercomputer, unveiled at Los Alamos, combines Nvidia Grace CPUs, Hopper GPUs, HPE Slingshot interconnects, and massive memory bandwidth to achieve a 15.6‑petaflop FP64 peak, illustrating the evolving balance between CPU and GPU workloads in modern high‑performance computing.

CPUGPUGrace

0 likes · 14 min read

Inside Los Alamos’ Venado Supercomputer: Architecture, Performance, and HPC Trends

Ops Development Stories

Jul 19, 2024 · Cloud Native

Deploy Ollama and Open-WebUI on Kubernetes: A Step‑by‑Step Guide

This article walks through deploying the open‑source LLM serving tool Ollama and the Open‑WebUI interface on a Kubernetes cluster using Helm, covering GPU considerations, configuration files, service exposure, and model management with practical code examples.

DeploymentGPUKubernetes

0 likes · 9 min read

Deploy Ollama and Open-WebUI on Kubernetes: A Step‑by‑Step Guide

360 Smart Cloud

Jul 17, 2024 · Artificial Intelligence

Parallelism and Memory‑Optimization Techniques for Distributed Large‑Scale Transformer Training

This article reviews the principles and practical implementations of data, pipeline, tensor, sequence, and context parallelism together with memory‑saving strategies such as recomputation and ZeRO, and demonstrates how the QLM framework leverages these techniques to accelerate large‑model training and fine‑tuning on multi‑GPU clusters.

GPUMegatron-LMMemory Optimization

0 likes · 18 min read

Parallelism and Memory‑Optimization Techniques for Distributed Large‑Scale Transformer Training

Cloud Native Technology Community

Jul 15, 2024 · Cloud Native

Deploying Ollama and Open‑WebUI on Kubernetes with Helm

This guide explains how to deploy the open‑source LLM serving tool Ollama and the Open‑WebUI front‑end on a Kubernetes cluster using Helm charts, covering GPU configuration, persistent storage, service exposure, and model selection for large language models.

GPUKubernetesLLM

0 likes · 8 min read

Deploying Ollama and Open‑WebUI on Kubernetes with Helm

Architects' Tech Alliance

Jul 9, 2024 · Industry Insights

How Nvidia’s Accelerated GPU Roadmap Is Shaping AI‑Scale Networking

Nvidia plans to shorten its GPU generation cycle to one year, launching Blackwell Ultra in 2025, Rubin in 2026, and Rubin Ultra in 2027, while boosting token‑generation efficiency and introducing AI‑optimized Ethernet solutions like Spectrum‑X800, aiming to dominate large‑scale AI clusters and reshape the high‑performance networking market.

AIGPUNvidia

0 likes · 6 min read

How Nvidia’s Accelerated GPU Roadmap Is Shaping AI‑Scale Networking

Open Source Linux

Jul 2, 2024 · Fundamentals

Why GPUs Power AI and Gaming: A Beginner’s Guide to Their Architecture

This article explains what a GPU is, how it differs from a CPU, its internal architecture, and why its massive parallel processing makes it essential for graphics rendering, scientific computation, and AI inference, illustrated with examples such as NVIDIA RTX 3090.

AI inferenceGPUGraphics Rendering

0 likes · 8 min read

Why GPUs Power AI and Gaming: A Beginner’s Guide to Their Architecture

Architects' Tech Alliance

Jun 23, 2024 · Industry Insights

What AMD Unveiled at Computex 2024: Zen 5 CPUs, XDNA NPU, AI GPUs and UA‑Link

At Computex 2024 AMD showcased its latest Zen 5 CPU core, second‑generation XDNA NPU with 50 TOPS AI performance, the consumer‑grade Ryzen 9000 series, AI‑focused Strix Point processors, Versal AI Edge Gen 2, upcoming AI GPUs like MI325X/MI350/MI400, and the new UA‑Link 1.0 networking standard.

AIAMDCPU

0 likes · 5 min read

What AMD Unveiled at Computex 2024: Zen 5 CPUs, XDNA NPU, AI GPUs and UA‑Link

Architects' Tech Alliance

Jun 22, 2024 · Artificial Intelligence

Rising Compute Demand of Generative AI Models and GPU Accelerator Trends in 2024

The article analyzes how generative AI models from GPT‑1 to the upcoming GPT‑5 are driving exponential growth in compute requirements, prompting massive cloud capital expenditures and intense competition among GPU vendors such as NVIDIA, AMD, Google, and emerging domestic chip makers, while also highlighting interconnect innovations and cost‑effective solutions.

AIAcceleratorsCompute

0 likes · 12 min read

Rising Compute Demand of Generative AI Models and GPU Accelerator Trends in 2024

Architects' Tech Alliance

Jun 16, 2024 · Industry Insights

How Nvidia’s Blackwell GPUs Aim to Slash AI Training Costs and Power

The article analyzes Nvidia’s historic advantage, the massive performance and energy efficiency gains from Pascal to Blackwell GPUs, the economics of training large language models like GPT‑4, and the detailed roadmap of upcoming GPU, memory, and interconnect technologies shaping the future of data‑center AI.

AIGPUNvidia

0 likes · 14 min read

How Nvidia’s Blackwell GPUs Aim to Slash AI Training Costs and Power

Architects' Tech Alliance

Jun 13, 2024 · Industry Insights

How Nvidia’s New Blackwell GPUs and NVLink Redefine AI Acceleration in 2024

The article analyzes Nvidia's latest AI‑focused hardware and software breakthroughs showcased at ComputeX 2024, detailing how GPU‑CPU hybrid architectures, new libraries, and high‑speed interconnects like NVLink dramatically boost performance while keeping power and cost growth modest.

AI accelerationBlackwellDGX

0 likes · 12 min read

How Nvidia’s New Blackwell GPUs and NVLink Redefine AI Acceleration in 2024

Architects' Tech Alliance

Jun 10, 2024 · Artificial Intelligence

NVLink vs PCIe GPUs: Which Nvidia AI Server Fits Your Workload?

This article compares Nvidia's NVLink (SXM) and PCIe GPU versions for AI servers, detailing their architectures, bandwidth, power consumption, and ideal use cases, helping readers choose the optimal configuration based on performance needs and budget constraints.

AI serversGPUNVLink

0 likes · 8 min read

NVLink vs PCIe GPUs: Which Nvidia AI Server Fits Your Workload?

Java Tech Enthusiast

Jun 7, 2024 · Fundamentals

Engineer Builds GPU from Scratch in Two Weeks

In just two weeks, engineer Adam Majmudar designed and implemented a minimalist GPU called tiny‑gpu—complete with a custom 11‑instruction ISA, Verilog RTL, and verified via OpenLane—sharing the open‑source project on GitHub, earning thousands of stars, and preparing it for fabrication through Tiny Tapeout 7, showcasing how modern tools make DIY chip design increasingly accessible.

Chip DesignEDAGPU

0 likes · 8 min read

Engineer Builds GPU from Scratch in Two Weeks

Python Programming Learning Circle

Jun 6, 2024 · Fundamentals

Accelerating Python with Numba: JIT Compilation, Decorators, and GPU Support

This article introduces Numba, a Python just‑in‑time compiler, explains why it is advantageous over alternatives, demonstrates how to apply its @jit, @njit, @vectorize and other decorators, and shows how to run accelerated code on CPUs and GPUs using CUDA.

CUDAGPUPython

0 likes · 9 min read

Accelerating Python with Numba: JIT Compilation, Decorators, and GPU Support

IT Services Circle

Jun 6, 2024 · Artificial Intelligence

Nvidia Unveils Blackwell GPU and AI Supercomputing Roadmap

Nvidia’s latest Blackwell GPU, presented by Jensen Huang, promises unprecedented performance and energy efficiency for large‑scale AI models, while the company also showcases accelerated computing, NVLink interconnects, AI‑optimized DGX servers, the NIM platform for rapid LLM deployment, and ambitious projects such as Earth‑2 digital twins and next‑generation embodied AI robots.

AIBlackwellGPU

0 likes · 18 min read

Nvidia Unveils Blackwell GPU and AI Supercomputing Roadmap

Architects' Tech Alliance

Jun 5, 2024 · Industry Insights

How HBM Is Transforming GPU Power and Driving the AI Memory Boom

HBM's near‑memory architecture, stacked design, and TSV integration dramatically cut latency and space while boosting bandwidth, leading NVIDIA and AMD to adopt it across multiple GPU generations, spurring fierce competition among SK Hynix, Samsung, and Micron and projecting a four‑fold market surge to $169 billion by 2024.

AIGPUHBM

0 likes · 11 min read

How HBM Is Transforming GPU Power and Driving the AI Memory Boom

Alibaba Cloud Infrastructure

May 31, 2024 · Cloud Native

Best Practices for Deploying AI Model Inference on Knative

This guide explains how to efficiently deploy AI model inference services on Knative by externalizing model data, using Fluid for accelerated loading, configuring secrets, ImageCache, graceful shutdown, probes, autoscaling parameters, mixed ECS/ECI resources, shared GPU scheduling, and observability features to achieve fast scaling, low cost, and high elasticity.

AI Model InferenceCloud NativeGPU

0 likes · 19 min read

Best Practices for Deploying AI Model Inference on Knative

May 24, 2024 · Cloud Computing

Understanding and Optimizing NCCL Collective Communication Libraries for Large‑Scale Model Training

The article explains how NCCL’s collective communication libraries enable efficient large‑scale model training by parsing GPU‑to‑NIC topology, forming flat‑ring and tree rings, improving logging and bandwidth metrics, detailing Ring AllReduce primitives, and proposing solutions to missing topology, metric, and mapping information for future optimization.

Distributed TrainingGPUNCCL

0 likes · 23 min read

Understanding and Optimizing NCCL Collective Communication Libraries for Large‑Scale Model Training

Open Source Linux

May 22, 2024 · Artificial Intelligence

Why GPUs Are the Powerhouse Behind Modern AI: A Deep Dive

This article explains how GPUs, with their parallel architecture and extensive software ecosystem, have become essential for accelerating AI training and inference, outperforming CPUs and shaping the future of artificial intelligence across various industries.

Deep LearningGPUHardware acceleration

0 likes · 10 min read

Why GPUs Are the Powerhouse Behind Modern AI: A Deep Dive

Architects' Tech Alliance

May 15, 2024 · Artificial Intelligence

Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes

This article provides a comprehensive technical overview of large‑scale GPU server architectures, detailing the component topology of 8‑GPU A100/A800 and H100/H800 nodes, explaining storage network cards, NVSwitch interconnects, bandwidth calculations, and the trade‑offs between RoCEv2 and InfiniBand for AI workloads.

GPUHigh‑performance computingNVLink

0 likes · 13 min read

Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes

Architects' Tech Alliance

May 14, 2024 · Artificial Intelligence

Why GPUs Are Essential for Modern Artificial Intelligence and How They Compare with CPUs, ASICs, and FPGAs

This article explains the pivotal role of GPUs in today’s generative AI era, describes their architecture and applications, compares them with CPUs, ASICs, and FPGAs, and offers guidance on selecting the right processor for AI workloads while also noting related reference resources.

Deep LearningGPUHardware

0 likes · 12 min read

Why GPUs Are Essential for Modern Artificial Intelligence and How They Compare with CPUs, ASICs, and FPGAs

May 10, 2024 · Artificial Intelligence

GPU Performance Optimization Practices for Tencent PCG Recommendation Model Training Framework

This article presents a comprehensive overview of Tencent PCG's GPU‑based recommendation model training framework, detailing why GPU adoption is essential, the hardware and software challenges faced, the multi‑level data architecture, pipeline design, and a series of network, storage, and compute optimizations, followed by future directions.

Distributed TrainingGPUModel Training

0 likes · 13 min read

GPU Performance Optimization Practices for Tencent PCG Recommendation Model Training Framework

Architects' Tech Alliance

May 9, 2024 · Artificial Intelligence

AI Servers: Market Opportunities, Architecture, and Future Demand Driven by Generative AI

The article examines how the surge of generative AI (AIGC) is fueling rapid growth in AI server demand, detailing the emerging AIGC ecosystem, server hardware composition, model scaling, heterogeneous computing, training vs. inference workloads, market size forecasts, and the competitive landscape of AI server manufacturers.

AI InfrastructureAI serversGPU

0 likes · 15 min read

AI Servers: Market Opportunities, Architecture, and Future Demand Driven by Generative AI

NetEase Cloud Music Tech Team

May 8, 2024 · Frontend Development

How We Halved Cloud Music Desktop Startup Time and Fixed UI Lag with a React Refactor

This article details the migration of the Cloud Music desktop client from a legacy NEJ‑CEF hybrid to a React‑based architecture, outlines four major performance challenges, and explains the step‑by‑step optimizations—including API preloading, render memoization, virtual‑list replacement, and resource‑usage reductions—that cut startup latency by 48%, eliminated interaction stutter, and dramatically lowered CPU, GPU, and memory consumption.

CPUGPUHybrid App

0 likes · 30 min read

How We Halved Cloud Music Desktop Startup Time and Fixed UI Lag with a React Refactor

Architects' Tech Alliance

May 7, 2024 · Industry Insights

Why GPUs Remain the Dominant AI Training Hardware: Trends and Challenges

The article analyzes why GPUs continue to dominate AI model training, comparing them with ASICs, CPUs, and other chips, and discusses ecosystem advantages, domestic development gaps, emerging edge‑AI demands, high‑bandwidth needs, and chiplet technology as future enablers.

AI hardwareChipletGPU

0 likes · 5 min read

Why GPUs Remain the Dominant AI Training Hardware: Trends and Challenges

DevOps Operations Practice

Apr 29, 2024 · Fundamentals

Introduction to CPUs and GPUs: Functions, Advanced Features, and Key Differences

This article explains the basic functions of CPUs and GPUs, their advanced capabilities and real‑world applications, and compares their architectures, processing models, and roles in environments such as IoT, mobile devices, Kubernetes, and AI workloads.

AI accelerationCPUGPU

0 likes · 7 min read

Introduction to CPUs and GPUs: Functions, Advanced Features, and Key Differences

Architects' Tech Alliance

Apr 27, 2024 · Industry Insights

What We Know About Nvidia’s Upcoming Blackwell GPUs and Their Power Surge

Nvidia’s next‑generation GeForce RTX 50 (Blackwell) GPUs are rumored to retain a 384‑bit memory bus, possibly adopt GDDR7 for up to 1.5 TB/s bandwidth, and push power consumption toward 1 kW, while Dell’s COO hints at new AI accelerators without liquid cooling.

AI acceleratorBlackwellGDDR7

0 likes · 9 min read

What We Know About Nvidia’s Upcoming Blackwell GPUs and Their Power Surge

Rare Earth Juejin Tech Community

Apr 24, 2024 · Artificial Intelligence

Training MNIST with Burn on wgpu: From PyTorch to Rust Backend

This tutorial demonstrates how to train a MNIST digit‑recognition model using the Rust‑based Burn framework on top of the cross‑platform wgpu API, covering model export from PyTorch to ONNX, code generation, data loading, training loops, and performance comparison across CPU, GPU, and other backends.

BurnDeep LearningGPU

0 likes · 13 min read

Training MNIST with Burn on wgpu: From PyTorch to Rust Backend

Architects' Tech Alliance

Apr 21, 2024 · Fundamentals

Understanding RDMA: InfiniBand, RoCE, and Their Role in High‑Performance AI Model Training

This article explains how Remote Direct Memory Access (RDMA) technologies such as InfiniBand and RoCE bypass OS kernels to achieve ultra‑low latency and high bandwidth, discusses their hardware implementations, cost considerations, and their critical impact on large‑scale AI model training and HPC network design.

AIGPUHigh‑Performance Computing

0 likes · 11 min read

Understanding RDMA: InfiniBand, RoCE, and Their Role in High‑Performance AI Model Training

Architects' Tech Alliance

Apr 20, 2024 · Industry Insights

How Many Optical Modules Do AI GPU SuperPODs Really Need? A Detailed Calculation

This article analyzes the factors influencing optical‑module requirements for AI GPU clusters, compares four typical network configurations for A100 and H100 SuperPODs, and provides step‑by‑step calculations that reveal the projected market demand for 200G, 400G, and 800G modules in 2023‑2024.

AIGPUSuperPoD

0 likes · 14 min read

How Many Optical Modules Do AI GPU SuperPODs Really Need? A Detailed Calculation

Architects' Tech Alliance

Apr 15, 2024 · Artificial Intelligence

Decoding GPU Server Topologies: From PCIe to NVLink for Large‑Model Training

This article provides a detailed technical overview of modern multi‑GPU server architectures—including PCIe switches, NVLink, NVSwitch, and HBM—explaining their hardware topologies, bandwidth characteristics, monitoring methods, and network choices to help engineers design efficient AI training clusters.

AI trainingGPUHBM

0 likes · 18 min read

Decoding GPU Server Topologies: From PCIe to NVLink for Large‑Model Training

Architects' Tech Alliance

Apr 12, 2024 · Industry Insights

Why AI Server Demand Is Set to Explode by 2025 – Key Trends and Market Drivers

The article analyzes the rapid evolution of AI servers, detailing the shift from general‑purpose to GPU‑enhanced AI hardware, the split between training and inference workloads, cost structures, forecasted compute needs for large models like GPT‑4, and the impact of US export restrictions and domestic competition on the global market.

AI serversGPUMarket analysis

0 likes · 6 min read

Why AI Server Demand Is Set to Explode by 2025 – Key Trends and Market Drivers

Meituan Technology Team

Apr 11, 2024 · Artificial Intelligence

GPU-Accelerated Mixed Vector-Scalar Retrieval System for Meituan Takeaway Search

Meituan Waimai’s search team built a GPU‑accelerated, mixed vector‑and‑scalar retrieval engine that supports billions of items, achieving over 99% recall and up to 89% latency reduction by combining pre‑filtering, optimized data layouts, multi‑GPU parallelism, and FP16 precision.

ANNFAISSGPU

0 likes · 20 min read

GPU-Accelerated Mixed Vector-Scalar Retrieval System for Meituan Takeaway Search

Architects' Tech Alliance

Apr 10, 2024 · Industry Insights

Inside the GPU Server: Architecture of A100/A800 and H100/H800 Nodes

This article provides a detailed technical breakdown of modern multi‑GPU server nodes, covering component composition, storage network cards, NVSwitch interconnects, bandwidth calculations, and the architectural differences between NVIDIA A100/A800 and H100/H800 configurations for AI training workloads.

A100AI trainingGPU

0 likes · 12 min read

Inside the GPU Server: Architecture of A100/A800 and H100/H800 Nodes

Apr 10, 2024 · Artificial Intelligence

Large Language Model Inference Overview and Performance Optimizations

This article presents a comprehensive overview of large language model inference, describing the prefill and decoding stages, key performance metrics such as throughput, latency and QPS, and detailing a series of system-level optimizations—including pipeline parallelism, dynamic batching, KV‑cache quantization, and hardware considerations—to significantly improve inference efficiency on modern GPUs.

GPUInferenceLatency

0 likes · 23 min read

Large Language Model Inference Overview and Performance Optimizations

Architects' Tech Alliance

Apr 6, 2024 · Industry Insights

How NVIDIA’s Blackwell GB200 NVL72 Redefines AI Compute with 10 TB/s Interconnect

The article analyses NVIDIA’s new Blackwell platform, focusing on the GB200 NVL72 GPU and its 10 TB/s NVLink‑C2C interconnect, detailing massive training and inference speedups, rack‑level DGX SuperPOD architecture, copper‑cable trends, and the broader impact on AI‑driven data‑center workloads.

AIBlackwellGPU

0 likes · 13 min read

How NVIDIA’s Blackwell GB200 NVL72 Redefines AI Compute with 10 TB/s Interconnect

Python Programming Learning Circle

Apr 3, 2024 · Fundamentals

Accelerating Python Code with Taichi: Up to 100× Speed Boosts

This article introduces Taichi, a Python‑embedded DSL that compiles kernel functions for CPU and GPU execution, and demonstrates through three practical examples how importing the library and adding decorators can accelerate Python code by up to a hundredfold, with detailed performance numbers and installation instructions.

DSLGPUPython

0 likes · 7 min read

Accelerating Python Code with Taichi: Up to 100× Speed Boosts

360 Smart Cloud

Apr 3, 2024 · Backend Development

Understanding FFmpeg Hardware Acceleration Architecture and Implementation

FFmpeg provides a comprehensive, cross‑platform hardware acceleration framework that abstracts diverse GPU and dedicated video codec interfaces, defines HWContext types, device and frame contexts, and various codec configuration methods, enabling efficient video encoding, decoding, and filtering while addressing performance, compatibility, and pipeline complexity challenges.

GPUHardware accelerationMultimedia

0 likes · 10 min read

Understanding FFmpeg Hardware Acceleration Architecture and Implementation

Architects' Tech Alliance

Mar 30, 2024 · Industry Insights

How NVIDIA’s B200 GPU Redefines AI Compute and What It Means for the Chip Market

The article analyzes the latest AI‑compute announcements from NVIDIA, AMD and Intel—including NVIDIA’s B200 GPU with 20 petaFLOPS FP4 performance, AMD’s MI300/MI400 roadmap, and Intel’s Gaudi 3 and Falcon Shores—while examining pricing, launch timelines, supply‑chain capacity, and the shifting market share among major cloud providers.

AI computeAMDGPU

0 likes · 10 min read

How NVIDIA’s B200 GPU Redefines AI Compute and What It Means for the Chip Market

Architects' Tech Alliance

Mar 28, 2024 · Industry Insights

How AI and “East‑West Computing” Are Reviving the Server Market

The article analyzes how the surge in AI workloads and the “East‑West Computing” strategy are reshaping the server industry, detailing the AI server component chain, the role of HBM memory, SSD evolution, and the stark cost‑structure differences between traditional and AI‑optimized servers.

AI serversGPUHBM

0 likes · 5 min read

How AI and “East‑West Computing” Are Reviving the Server Market

Architects' Tech Alliance

Mar 24, 2024 · Artificial Intelligence

NVLink vs PCIe GPUs: Which NVIDIA Server GPU Wins for Your AI Workload?

This article compares NVIDIA's NVLink (SXM) and PCIe GPU versions for AI servers, detailing their architectures, bandwidth, power consumption, and ideal use cases, and provides guidance on selecting the right GPU based on workload size, flexibility, and cost considerations.

AI serversGPUNVLink

0 likes · 9 min read

NVLink vs PCIe GPUs: Which NVIDIA Server GPU Wins for Your AI Workload?

Architects' Tech Alliance

Mar 20, 2024 · Industry Insights

What Nvidia’s B100 and GB200 Reveal About the Future of AI GPUs

The GTC 2024 recap highlights Nvidia’s upcoming B100 and GB200 GPUs, their BlackWell architecture, performance breakthroughs, embodied‑intelligence initiatives, and the expanding AI application ecosystem across industries, offering a clear view of the next wave in accelerated computing.

AIB100Embodied Intelligence

0 likes · 7 min read

What Nvidia’s B100 and GB200 Reveal About the Future of AI GPUs

Mar 20, 2024 · Artificial Intelligence

Nvidia Unveils Blackwell GPU: A Quantum Leap for Generative AI

Nvidia introduced the Blackwell GPU architecture at GTC, highlighting six breakthrough technologies, a 4nm process, massive performance gains, and its integration into DGX SuperPOD systems that promise to accelerate generative AI, data processing, and high‑performance computing across industries.

AIBlackwellGPU

0 likes · 14 min read

Nvidia Unveils Blackwell GPU: A Quantum Leap for Generative AI

Architects' Tech Alliance

Mar 18, 2024 · Industry Insights

Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects

The article provides an in‑depth technical analysis of Nvidia’s NVLink C2C interconnect, comparing its latency, bandwidth, power efficiency, density and cost against traditional SerDes solutions and examining its role in building SuperChip architectures with Grace CPUs and Hopper GPUs.

GPUNVLinkcost analysis

0 likes · 12 min read

Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects

Architects' Tech Alliance

Mar 17, 2024 · Industry Insights

Why GPUs Remain the Dominant AI Compute Engine: Trends, Risks, and Future Outlook

The article analyzes current AI hardware options, explains why GPUs continue to dominate model training due to architectural compatibility, ecosystem support, and market maturity, and outlines emerging trends such as model miniaturization, optical interconnects, and chiplet technology that will shape the next generation of AI compute.

AI computeChipletGPU

0 likes · 6 min read

Why GPUs Remain the Dominant AI Compute Engine: Trends, Risks, and Future Outlook

iQIYI Technical Product Team

Mar 15, 2024 · Artificial Intelligence

Optimizing GPU Inference for CTR Models: Kernel Fusion, Multi‑Stream Execution, and Batch Merging

By fusing sparse‑feature operators, enabling multi‑stream execution, consolidating data copies, and merging inference batches, iQIYI reduced GPU CTR model latency to CPU‑level, boosted throughput over sixfold, and cut operational costs by more than 40%, overcoming launch‑overhead bottlenecks.

CTRGPUInference Optimization

0 likes · 10 min read

Optimizing GPU Inference for CTR Models: Kernel Fusion, Multi‑Stream Execution, and Batch Merging

Tencent Cloud Developer

Mar 14, 2024 · Mobile Development

Aurora Animation and 3D Penguin Effects in Mobile QQ: Noise Algorithms, Color Mapping, Performance Optimization, and Rendering Techniques

The new QQ 9.0 introduces aurora‑style animations generated by continuous, smoothed noise algorithms with uniform‑probability color mapping, and a spring‑driven 3D penguin rendered via Filament’s PBR materials and GPU compute shaders, achieving sub‑2 ms performance on most Android and iOS devices.

3DGPUMobile

0 likes · 17 min read

Aurora Animation and 3D Penguin Effects in Mobile QQ: Noise Algorithms, Color Mapping, Performance Optimization, and Rendering Techniques

Alibaba Cloud Native

Mar 9, 2024 · Cloud Computing

Deploy Google Gemma LLM on Alibaba Cloud Function Compute GPU with Low‑Cost Idle Mode

This guide shows how to quickly and cheaply deploy the open‑source Google Gemma large language model on Alibaba Cloud Function Compute GPU using the new idle‑billing mode, covering prerequisites, Docker image creation, function setup, idle reservation, testing, monitoring, and cost estimation.

Function ComputeGPUGemma

0 likes · 10 min read

Deploy Google Gemma LLM on Alibaba Cloud Function Compute GPU with Low‑Cost Idle Mode

Mar 8, 2024 · Industry Insights

Why Building LLMs Is Like Buying a Hardware Lottery – Lessons from a Startup

The article recounts Yi Tay’s experience founding Reka and building large language models from scratch, highlighting the unpredictable quality of GPU clusters, the challenges of multi‑cluster orchestration, code‑base choices, and how startups must rely on fast, intuition‑driven experimentation to succeed.

Cluster ManagementGPUHardware

0 likes · 12 min read

Why Building LLMs Is Like Buying a Hardware Lottery – Lessons from a Startup

MaGe Linux Operations

Mar 5, 2024 · Cloud Native

How to Run GPU‑Accelerated AI Workloads on Kubernetes

This article explains how Kubernetes supports GPU workloads for AI and machine learning, covering device plugins, pod GPU requests, oversubscription, security isolation, cloud‑provider node setup, and protecting GPU nodes from non‑GPU pods.

AI workloadsCloud NativeDevice Plugin

0 likes · 8 min read

How to Run GPU‑Accelerated AI Workloads on Kubernetes

OPPO Kernel Craftsman

Mar 1, 2024 · Mobile Development

GPU Frequency Scaling on Qualcomm Adreno Using the Linux devfreq Framework

Using Qualcomm’s Adreno GPU as a case study, the article explains how the Linux devfreq framework enables GPU frequency scaling by creating a kgsl devfreq device and an msm‑adreno‑tz governor, detailing their initialization, event handling, target‑frequency computation, and the kernel callbacks that apply the new rates.

AdrenoGPULinux kernel

0 likes · 5 min read

GPU Frequency Scaling on Qualcomm Adreno Using the Linux devfreq Framework

Architects' Tech Alliance

Feb 26, 2024 · Game Development

How Do GPUs Power Modern Rendering? A Deep Dive into Architecture and Optimization

This article provides a comprehensive technical overview of GPU architecture, from memory hierarchy and compute units to rendering pipelines and optimization techniques, explaining how modern graphics hardware processes shaders, manages resources, and balances performance across different rendering strategies.

GPUGame DevelopmentGraphics

0 likes · 33 min read

How Do GPUs Power Modern Rendering? A Deep Dive into Architecture and Optimization

Feb 19, 2024 · Artificial Intelligence

Large Language Model Inference Overview and Performance Optimizations

This article presents a comprehensive overview of large language model inference, detailing the prefill and decoding stages, key performance metrics such as throughput, latency and QPS, and a series of system-level optimizations—including pipeline parallelism, dynamic batching, specialized attention kernels, virtual memory allocation, KV‑cache quantization, and mixed‑precision strategies—to improve GPU utilization and overall inference efficiency.

GPULLMLatency

0 likes · 24 min read

Large Language Model Inference Overview and Performance Optimizations

Feb 11, 2024 · Artificial Intelligence

GPU-Accelerated Model Service and Optimization Practices at Xiaohongshu

This article details Xiaohongshu's end‑to‑end GPU‑based transformation of its recommendation and search models, covering background, model characteristics, training and inference frameworks, system‑level and GPU‑level optimizations, compilation tricks, hardware upgrades, and future directions for large‑scale machine‑learning infrastructure.

GPUModel ServingTraining

0 likes · 18 min read

GPU-Accelerated Model Service and Optimization Practices at Xiaohongshu

Architecture and Beyond

Feb 8, 2024 · Artificial Intelligence

Mastering AIGC: 15 Essential AI Terms and Key Technologies Explained

This article provides a comprehensive overview of core AI concepts, from basic definitions of AI, AGI, and AIGC to detailed explanations of GPUs, major generative models, leading AI products, and influential companies, helping readers quickly grasp the landscape of AI-generated content.

AIAIGCCLIP

0 likes · 24 min read

Mastering AIGC: 15 Essential AI Terms and Key Technologies Explained

IT Services Circle

Feb 1, 2024 · Fundamentals

The Rise of NPU and Integrated Memory in AI PCs and Intel's Lunar Lake Architecture

The article examines how CPUs, GPUs, and memory have long formed the core of PC hardware, discusses the emerging role of NPUs for AI processing, and describes Intel's Lunar Lake strategy of integrating memory with the processor to deliver faster, lower‑latency performance in upcoming AI‑focused PCs.

AI PCCPUGPU

0 likes · 5 min read

The Rise of NPU and Integrated Memory in AI PCs and Intel's Lunar Lake Architecture

Architects' Tech Alliance

Jan 30, 2024 · Industry Insights

Why Computing Power Leasing Is Booming: 2024 Industry Framework & Trends

The article outlines the 2024 computing‑power leasing industry framework, explains three main rental models, highlights the surge in demand driven by generative AI, the shortage of high‑end GPUs, and provides an extensive collection of links to reports and analyses on GPU technology, market dynamics, and future development paths.

AIGPUIndustry analysis

0 likes · 5 min read

Why Computing Power Leasing Is Booming: 2024 Industry Framework & Trends

Jan 26, 2024 · Artificial Intelligence

Efficient Deployment of Speech AI Models on GPUs

This article explains how to efficiently deploy speech AI models—including ASR and TTS—on GPUs using NVIDIA's Triton Inference Server and TensorRT, covering background challenges, GPU‑based solutions, decoding optimizations, Whisper acceleration with TensorRT‑LLM, streaming TTS improvements, voice‑cloning pipelines, future plans, and a Q&A session.

ASRGPUInference

0 likes · 20 min read

Efficient Deployment of Speech AI Models on GPUs

Architects' Tech Alliance

Jan 23, 2024 · Industry Insights

Why Intel and AMD Dominate CPUs and What Opportunities Exist for Chinese Chipmakers

The article analyzes the global CPU and GPU markets, showing Intel and AMD's overwhelming share, the rise of new data‑center players, key performance metrics for CPUs, the constraints of instruction‑set ecosystems, and emerging AI‑chip design trends that could open space for domestic Chinese manufacturers.

CPUChinaGPU

0 likes · 15 min read

Why Intel and AMD Dominate CPUs and What Opportunities Exist for Chinese Chipmakers

Architects' Tech Alliance

Jan 21, 2024 · Industry Insights

What Nvidia GH200 and AMD MI300 Reveal About the Future of AI Compute

The article examines Nvidia's GH200 superchip and AMD's Instinct MI300, compares CPU, GPU, FPGA, and ASIC architectures, analyzes market share trends, and discusses opportunities for domestic chip makers in the rapidly evolving AI compute landscape.

AI chipsAMDASIC

0 likes · 13 min read

What Nvidia GH200 and AMD MI300 Reveal About the Future of AI Compute

Architects' Tech Alliance

Jan 14, 2024 · Fundamentals

Overview of CPU, GPU, and Storage Fundamentals in the Xinchuang Industry

This article introduces the Xinchuang (information technology innovation) industry, outlines its hardware components, and provides concise explanations of CPU concepts, instruction sets, GPU architecture and operation, as well as storage classifications, while also linking to related research reports and promotional resources.

CPUGPUInformation Technology

0 likes · 8 min read

Overview of CPU, GPU, and Storage Fundamentals in the Xinchuang Industry

Architects' Tech Alliance

Jan 14, 2024 · Industry Insights

Can Chinese GPUs Close the Gap with NVIDIA? 2023 GPGPU Landscape Analysis

2023 GPGPU research framework analysis reveals that while Chinese GPUs like BR100 and TianGai100 can match or exceed NVIDIA A100 in FP32, they still lag in FP64 and INT8 performance, and the domestic software ecosystem based on OpenCL trails far behind NVIDIA's CUDA, shaping short‑and‑term market dynamics.

AI computingCUDAChina

0 likes · 6 min read

Can Chinese GPUs Close the Gap with NVIDIA? 2023 GPGPU Landscape Analysis

Architects' Tech Alliance

Jan 11, 2024 · Industry Insights

What Makes Nvidia’s RTX 5880 Ada Stand Out? Specs, Performance, and Market Position

Nvidia's RTX 5880 Ada, a China‑specific GPU built on a trimmed AD102 chip, offers 14,080 CUDA cores, 48 GB ECC GDDR6 memory, and an estimated 69.3 TFLOPS performance, positioning it between the RTX 6000 Ada and RTX 5000 Ada while complying with U.S. export limits.

CUDA coresGPUNvidia

0 likes · 8 min read

What Makes Nvidia’s RTX 5880 Ada Stand Out? Specs, Performance, and Market Position

Architects' Tech Alliance

Jan 4, 2024 · Industry Insights

China’s 2023 Xinchuang Boom: Key Trends in CPUs, GPUs, DPU & Cloud

The 2023 Xinchuang industry report outlines how China's information‑technology innovation sector entered a rapid growth phase, highlighting market expansion, dominant keywords, the evolving hardware ecosystem—including CPUs, GPUs, AI chips, DPU and cloud databases—and the strategic shift toward full‑industry adoption across eight critical sectors.

CPUChinaDPU

0 likes · 14 min read

China’s 2023 Xinchuang Boom: Key Trends in CPUs, GPUs, DPU & Cloud

IT Services Circle

Jan 2, 2024 · Fundamentals

NVIDIA Introduces RTX 4090 D: China‑Specific GPU with Reduced CUDA and Tensor Cores

Due to U.S. export restrictions, NVIDIA released a China‑specific RTX 4090 D GPU that meets the TPP limit by reducing CUDA and Tensor cores while keeping most other specifications unchanged, and it is priced the same as the standard RTX 4090.

Export controlsGPUHardware Specs

0 likes · 4 min read

NVIDIA Introduces RTX 4090 D: China‑Specific GPU with Reduced CUDA and Tensor Cores

Architects' Tech Alliance

Dec 28, 2023 · Industry Insights

Why HBM Is Redefining GPU Memory: Performance, Architecture, and Market Trends

The article examines High Bandwidth Memory (HBM) technology—its 3D‑stacked architecture, superior bandwidth and power efficiency over GDDR, adoption in AI GPUs, generational performance gains, TSV manufacturing processes, and the evolving market share among major vendors.

AI serversGPUHBM

0 likes · 10 min read

Why HBM Is Redefining GPU Memory: Performance, Architecture, and Market Trends

Architects' Tech Alliance

Dec 27, 2023 · Industry Insights

Nvidia H100 vs Huawei Ascend 910B: In‑Depth GPU Performance and Bandwidth Comparison

This article compiles official specifications and benchmark data to compare Nvidia’s mainstream GPUs (L2, T4, A10, A10G, V100, A100, A800, H100) with Huawei’s Ascend series (910B, H20/L20), highlighting performance differences, inter‑GPU bandwidth via NVLink versus HCCS, and key takeaways for AI workloads.

AI hardwareGPUHuawei

0 likes · 5 min read

Nvidia H100 vs Huawei Ascend 910B: In‑Depth GPU Performance and Bandwidth Comparison