Tag

GPU

0 views collected around this technical thread.

Architects' Tech Alliance
Architects' Tech Alliance
Jun 15, 2025 · Fundamentals

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

This comprehensive guide covers GPU definitions, evolution, core components, architectural designs, performance metrics, programming models, deep‑learning applications, comparisons with other processors, practical use cases, optimization techniques, and future trends, providing a solid foundation for anyone interested in modern graphics and compute acceleration.

Computer ArchitectureDeep LearningGPU
0 likes · 43 min read
Master GPU Fundamentals: Architecture, Performance, and Programming Insights
Architects' Tech Alliance
Architects' Tech Alliance
Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU
0 likes · 6 min read
What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?
Architects' Tech Alliance
Architects' Tech Alliance
Jun 6, 2025 · Artificial Intelligence

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

This article compares NVIDIA’s China‑specific B30 and high‑end H20 GPUs, detailing their CPU/CPU architecture updates, memory technologies, architectural differences, performance metrics, power and cooling characteristics, and price positioning, to help enterprises and developers choose the most suitable accelerator for AI and deep‑learning tasks.

AI accelerationB30GPU
0 likes · 13 min read
B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?
Architects' Tech Alliance
Architects' Tech Alliance
Jun 5, 2025 · Artificial Intelligence

Why AI Server Market Is Shifting: Key Trends and Winners in 2024

The Chinese AI server market is booming, with GPU servers still dominant while non‑GPU accelerators surge, IDC forecasts a compound annual growth above 20% through 2028, and leading vendors such as Inspur, H3C, and emerging Ascend‑based manufacturers reshaping the competitive landscape.

AI ServersASICChina
0 likes · 10 min read
Why AI Server Market Is Shifting: Key Trends and Winners in 2024
DataFunTalk
DataFunTalk
Jun 4, 2025 · Artificial Intelligence

Coupang’s Distributed Cache Architecture Accelerates AI/ML Model Training

Coupang’s AI platform replaces costly data‑copy steps with a distributed cache that automatically pulls data from a central lake, boosts GPU utilization across regions, cuts storage and operational expenses, and speeds up model training by up to 40% while simplifying deployment via Kubernetes.

AIData LakeGPU
0 likes · 9 min read
Coupang’s Distributed Cache Architecture Accelerates AI/ML Model Training
Python Programming Learning Circle
Python Programming Learning Circle
Jun 2, 2025 · Artificial Intelligence

NVIDIA Adds Native Python Support to CUDA – What It Means for Developers

NVIDIA announced at GTC 2025 that CUDA will now natively support Python, allowing developers to write GPU‑accelerated code directly in Python without C/C++ knowledge, introducing new APIs, libraries, JIT compilation, performance tools, and a tile‑based programming model that aligns with Python’s array‑centric workflow.

AIAccelerated ComputingCUDA
0 likes · 7 min read
NVIDIA Adds Native Python Support to CUDA – What It Means for Developers
Architects' Tech Alliance
Architects' Tech Alliance
Jun 1, 2025 · Artificial Intelligence

Evolution, Industry Landscape, and Standards of Graphics GPUs

This article traces the history of graphics GPUs from their 1980s origins to modern AI and high‑performance computing roles, examines China's emerging GPU market and its challenges, and reviews the key graphics and compute standards shaping the industry today.

Artificial IntelligenceGPUHardware
0 likes · 10 min read
Evolution, Industry Landscape, and Standards of Graphics GPUs
Architects' Tech Alliance
Architects' Tech Alliance
May 31, 2025 · Artificial Intelligence

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

This article explains the concepts of AI Pods and GPU clusters, compares vertical (scale‑up) and horizontal (scale‑out) expansion, describes XPU types, discusses internal and inter‑pod communication, and evaluates the benefits and drawbacks of each scaling approach along with relevant networking technologies.

AI PodsGPUInfiniBand
0 likes · 10 min read
GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2025 · Artificial Intelligence

NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing

NVLink Fusion, unveiled at Computex 2025, extends NVIDIA’s NVLink technology to enable high‑bandwidth, low‑latency connections between CPUs and GPUs or third‑party accelerators, offering up to 900 GB/s bandwidth, flexible heterogeneous configurations, ecosystem expansion, performance gains for AI training and inference, and potential cost reductions.

AICPUData Center
0 likes · 12 min read
NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing
Architects' Tech Alliance
Architects' Tech Alliance
May 23, 2025 · Artificial Intelligence

Analysis of Nvidia’s China‑Specific Cut‑Down GPUs: H20, B20, and B40

This article examines the impact of U.S. export restrictions on Nvidia’s China‑specific GPU lineup, detailing the specifications and architectural changes of the H20, B20, and B40 chips, while also discussing domestic alternatives and the broader implications for AI compute in China.

AI chipsB20B40
0 likes · 10 min read
Analysis of Nvidia’s China‑Specific Cut‑Down GPUs: H20, B20, and B40
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 30, 2025 · Cloud Native

Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration

This guide explains how to prepare, deploy, and verify the Qwen3‑8B large language model on an Alibaba Cloud Container Service for Kubernetes (ACK) cluster using ACS GPU resources, covering prerequisites, model download, storage setup, Kubernetes manifests, and testing the inference service.

ACSAckCloud Native
0 likes · 8 min read
Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 9, 2025 · Cloud Computing

Multi-Region Serverless Compute Scheduling with Alibaba Cloud ACK One Registered Cluster

This guide explains how Alibaba Cloud's ACK One registered cluster provides multi‑region serverless GPU compute scheduling, addressing AI workload elasticity by using region‑specific labels, ResourcePolicy, and the ack‑co‑scheduler to automatically balance resources across regions.

ACK OneAlibaba CloudGPU
0 likes · 10 min read
Multi-Region Serverless Compute Scheduling with Alibaba Cloud ACK One Registered Cluster
Python Programming Learning Circle
Python Programming Learning Circle
Apr 3, 2025 · Artificial Intelligence

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

This article explains how to dramatically speed up PyTorch model training using code optimizations, mixed‑precision, torch.compile, distributed data parallelism, and DeepSpeed, presenting benchmark results that show up to 11.5× acceleration on multiple GPUs while maintaining high accuracy.

Deep LearningDeepSpeedDistributed Training
0 likes · 6 min read
Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Apr 1, 2025 · Artificial Intelligence

DeepGEMM vs Cutlass vs Triton: Which GPU GEMM Library Delivers the Best FP8 Performance?

This article presents a comprehensive benchmark of DeepGEMM, Cutlass, and Triton on NVIDIA H20 and H800 GPUs, analyzing TFLOPS, bandwidth, latency, and speedup across various matrix sizes, and concludes which library is optimal for different workload scenarios.

BenchmarkCUDADeepGEMM
0 likes · 15 min read
DeepGEMM vs Cutlass vs Triton: Which GPU GEMM Library Delivers the Best FP8 Performance?
Architects' Tech Alliance
Architects' Tech Alliance
Mar 28, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin

The article traces NVIDIA’s GPU architecture evolution from the Volta era’s pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs, highlighting key innovations such as mixed‑precision support, sparsity, NVLink, and their impact on deep‑learning performance.

AI hardwareDeep LearningGPU
0 likes · 10 min read
Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin
Tencent Technical Engineering
Tencent Technical Engineering
Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming
0 likes · 42 min read
Fundamentals of GPU Architecture and Programming
JD Tech
JD Tech
Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail
0 likes · 20 min read
JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications
AntTech
AntTech
Mar 19, 2025 · Artificial Intelligence

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

At HPCA 2025, two standout papers—UniNDP, a unified compilation and simulation tool for near‑DRAM processing architectures, and WarpDrive, a GPU‑based fully homomorphic encryption accelerator leveraging Tensor and CUDA cores—demonstrate significant performance gains for AI workloads and privacy‑preserving computation.

AI accelerationFully Homomorphic EncryptionGPU
0 likes · 5 min read
Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)
DataFunSummit
DataFunSummit
Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu's machine‑learning platform lead Wang Xin's presentation on the ZhiLight large‑model inference framework, covering model execution mechanisms, GPU workload analysis, pipeline and tensor parallelism, GPU architecture evolution, open‑source engine comparisons, ZhiLight's compute‑communication overlap and quantization optimizations, benchmark results, supported models, and future directions.

GPUInferenceLLM
0 likes · 13 min read
Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations
Cognitive Technology Team
Cognitive Technology Team
Mar 11, 2025 · Artificial Intelligence

Deploying DeepSeek R1:7b Model Locally with Ollama and Building AI Applications Using Dify

This tutorial explains how to set up Ollama for CPU or GPU environments, run the DeepSeek R1:7b large language model, and use the open‑source Dify platform to create and deploy a custom AI application, providing step‑by‑step commands and configuration details.

AIDeepSeekDify
0 likes · 8 min read
Deploying DeepSeek R1:7b Model Locally with Ollama and Building AI Applications Using Dify