Tagged articles
536 articles
Page 2 of 6
Architects' Tech Alliance
Architects' Tech Alliance
Jun 19, 2025 · Fundamentals

Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained

This comprehensive guide covers 100 essential GPU fundamentals, from basic definitions and architecture to core technologies, performance optimization, emerging trends, and industry developments, providing a complete technical foundation for graphics, AI, and high‑performance computing applications.

Deep LearningGPUGraphics Processing Unit
0 likes · 19 min read
Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jun 18, 2025 · Cloud Native

Unifying GPU Management Across Kubernetes Clusters with RBAC & Virtual Control Planes

This article examines how to centrally manage GPU resources across heterogeneous Kubernetes clusters using namespace‑based RBAC isolation, virtual control‑plane solutions like vcluster, and multi‑cluster tools such as Karmada, comparing their architectures, use cases, advantages, and limitations to guide enterprise‑level deployment decisions.

Cloud NativeGPUKubernetes
0 likes · 14 min read
Unifying GPU Management Across Kubernetes Clusters with RBAC & Virtual Control Planes
Architects' Tech Alliance
Architects' Tech Alliance
Jun 15, 2025 · Fundamentals

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

This comprehensive guide covers GPU definitions, evolution, core components, architectural designs, performance metrics, programming models, deep‑learning applications, comparisons with other processors, practical use cases, optimization techniques, and future trends, providing a solid foundation for anyone interested in modern graphics and compute acceleration.

Deep LearningGPUHardware
0 likes · 43 min read
Master GPU Fundamentals: Architecture, Performance, and Programming Insights
Ops Development Stories
Ops Development Stories
Jun 12, 2025 · Cloud Native

One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

This tutorial walks you through using a one‑click script to create a GPU‑enabled Kind Kubernetes cluster, evenly distribute GPU resources across nodes with nvkind, install necessary drivers and toolkits, deploy a vLLM‑served large language model, and verify its operation, all on a local or cloud environment.

AI Model DeploymentDockerGPU
0 likes · 23 min read
One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models
Architects' Tech Alliance
Architects' Tech Alliance
Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU
0 likes · 6 min read
What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACUTLASSGEMM
0 likes · 6 min read
How to Build High‑Performance GEMM with NVIDIA CUTLASS
Architects' Tech Alliance
Architects' Tech Alliance
Jun 6, 2025 · Artificial Intelligence

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

This article compares NVIDIA’s China‑specific B30 and high‑end H20 GPUs, detailing their CPU/CPU architecture updates, memory technologies, architectural differences, performance metrics, power and cooling characteristics, and price positioning, to help enterprises and developers choose the most suitable accelerator for AI and deep‑learning tasks.

AI accelerationB30GPU
0 likes · 13 min read
B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?
Architects' Tech Alliance
Architects' Tech Alliance
Jun 5, 2025 · Artificial Intelligence

Why AI Server Market Is Shifting: Key Trends and Winners in 2024

The Chinese AI server market is booming, with GPU servers still dominant while non‑GPU accelerators surge, IDC forecasts a compound annual growth above 20% through 2028, and leading vendors such as Inspur, H3C, and emerging Ascend‑based manufacturers reshaping the competitive landscape.

AI serversASICChina
0 likes · 10 min read
Why AI Server Market Is Shifting: Key Trends and Winners in 2024
DataFunTalk
DataFunTalk
Jun 4, 2025 · Artificial Intelligence

Coupang’s Distributed Cache Architecture Accelerates AI/ML Model Training

Coupang’s AI platform replaces costly data‑copy steps with a distributed cache that automatically pulls data from a central lake, boosts GPU utilization across regions, cuts storage and operational expenses, and speeds up model training by up to 40% while simplifying deployment via Kubernetes.

AIData LakeGPU
0 likes · 9 min read
Coupang’s Distributed Cache Architecture Accelerates AI/ML Model Training
Python Programming Learning Circle
Python Programming Learning Circle
Jun 2, 2025 · Artificial Intelligence

NVIDIA Adds Native Python Support to CUDA – What It Means for Developers

NVIDIA announced at GTC 2025 that CUDA will now natively support Python, allowing developers to write GPU‑accelerated code directly in Python without C/C++ knowledge, introducing new APIs, libraries, JIT compilation, performance tools, and a tile‑based programming model that aligns with Python’s array‑centric workflow.

AICUDAGPU
0 likes · 7 min read
NVIDIA Adds Native Python Support to CUDA – What It Means for Developers
Architects' Tech Alliance
Architects' Tech Alliance
May 31, 2025 · Artificial Intelligence

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

This article explains the concepts of AI Pods and GPU clusters, compares vertical (scale‑up) and horizontal (scale‑out) expansion, describes XPU types, discusses internal and inter‑pod communication, and evaluates the benefits and drawbacks of each scaling approach along with relevant networking technologies.

AI PodsGPUInfiniBand
0 likes · 10 min read
GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2025 · Artificial Intelligence

NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing

NVLink Fusion, unveiled at Computex 2025, extends NVIDIA’s NVLink technology to enable high‑bandwidth, low‑latency connections between CPUs and GPUs or third‑party accelerators, offering up to 900 GB/s bandwidth, flexible heterogeneous configurations, ecosystem expansion, performance gains for AI training and inference, and potential cost reductions.

AICPUData center
0 likes · 12 min read
NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing
Architects' Tech Alliance
Architects' Tech Alliance
May 20, 2025 · Industry Insights

What Do GPU Core Specs Really Mean? A Deep Dive into Modern GPU Performance

This article provides a comprehensive analysis of GPU core parameters—including compute units, memory systems, floating‑point performance, power consumption, and manufacturing process—while comparing leading international and domestic GPU products to help readers choose the right accelerator for AI, HPC, or graphics workloads.

AIBenchmarkingGPU
0 likes · 19 min read
What Do GPU Core Specs Really Mean? A Deep Dive into Modern GPU Performance
AI Frontier Lectures
AI Frontier Lectures
May 20, 2025 · Industry Insights

How New US Geo‑Tracking Laws Could Reshape the High‑End GPU Market

A US Senate bill introduced by Senator Tom Cotton requires Nvidia, AMD, Intel and other high‑end GPU and AI processor makers to embed geolocation tracking, imposing six‑month compliance deadlines, new reporting obligations, and potentially billions of dollars in added R&D and export‑control costs.

Export ControlGPUGeo-tracking
0 likes · 8 min read
How New US Geo‑Tracking Laws Could Reshape the High‑End GPU Market
21CTO
21CTO
May 15, 2025 · Artificial Intelligence

AI Updates: Tencent GPUs, Alibaba Qwen3, Anaconda Platform, Google Apigee

This roundup highlights Tencent's GPU capacity for future models, Alibaba's fully disclosed Qwen3 technical report, Anaconda's unified AI platform, Parasoft's AI‑enhanced SOAtest, and Google Cloud's GA of the Apigee API Management Operator, offering a snapshot of current AI advancements.

AIAPI ManagementGPU
0 likes · 5 min read
AI Updates: Tencent GPUs, Alibaba Qwen3, Anaconda Platform, Google Apigee
Meituan Technology Team
Meituan Technology Team
May 8, 2025 · Artificial Intelligence

Building a Mixed OR+ML Inference Framework with TritonServer: Architecture, Challenges, and Solutions

The article describes how a large‑scale dispatch system was re‑engineered with NVIDIA TritonServer to unify GPU‑accelerated operations‑research kernels and deep‑learning models, detailing a three‑stage architecture (in‑process, cross‑process, cross‑node), the performance, stability and memory challenges addressed, and future plans for heterogeneous GPU scaling.

GPUInferencePerformance Optimization
0 likes · 11 min read
Building a Mixed OR+ML Inference Framework with TritonServer: Architecture, Challenges, and Solutions
Architects' Tech Alliance
Architects' Tech Alliance
Apr 29, 2025 · Industry Insights

Next-Gen Server Architecture: CPUs, GPUs, Memory, and Certification Insights

This article provides a comprehensive analysis of modern server architecture, covering the evolution from CISC to RISC, the rise of heterogeneous computing with GPUs and accelerators, diverse form factors, core component technologies, reliability mechanisms, performance benchmarking, certification standards, and emerging trends such as liquid cooling and AI‑native designs.

CPUData centerGPU
0 likes · 11 min read
Next-Gen Server Architecture: CPUs, GPUs, Memory, and Certification Insights
Architects' Tech Alliance
Architects' Tech Alliance
Apr 13, 2025 · Industry Insights

Which NVIDIA GPU Wins for AI? Deep Dive into RTX & A‑Series Performance and Power

This article presents a detailed comparison of major NVIDIA GPUs—including RTX 4090, RTX 4090 D, RTX 3090, A10, A40, A100, and H100—covering memory size, bandwidth, Tensor BF16/FP16/FP32 throughput, FP16/FP32 performance, power draw and release dates, and explains how these specs affect AI workload efficiency.

AI workloadsGPUIndustry analysis
0 likes · 9 min read
Which NVIDIA GPU Wins for AI? Deep Dive into RTX & A‑Series Performance and Power
AI Frontier Lectures
AI Frontier Lectures
Apr 8, 2025 · Industry Insights

Nvidia’s GPU Names Explained: Ampere, Hopper, Blackwell, Rubin, Feynman

At the recent GTC conference Nvidia unveiled its roadmap of AI‑focused GPUs—Ampere, Hopper, Blackwell, Rubin and the upcoming Feynman—each named after a pioneering scientist, and this article explores the historical contributions of André‑Marie Ampère, Grace Hopper, David Blackwell, Vera Rubin and Richard Feynman, linking their legacies to the architectures’ innovations.

AIGPUNvidia
0 likes · 10 min read
Nvidia’s GPU Names Explained: Ampere, Hopper, Blackwell, Rubin, Feynman
Architects' Tech Alliance
Architects' Tech Alliance
Apr 6, 2025 · Fundamentals

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

As AI models grow to trillion‑parameter scales, training them demands massive GPU clusters whose performance is increasingly limited by network bandwidth; this article examines why traditional PCIe interconnects become bottlenecks and how NVIDIA's NVLink and NVSwitch technologies dramatically improve multi‑GPU communication and overall system efficiency.

AI trainingGPUHigh‑performance computing
0 likes · 12 min read
PCIe vs NVLink: How Modern GPU Interconnects Power AI Training
Architects' Tech Alliance
Architects' Tech Alliance
Apr 4, 2025 · Industry Insights

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

This article analyzes the AI compute chip ecosystem, covering GPU, FPGA, and ASIC categories, market share projections, key performance metrics such as TOPS, power and area, and provides a detailed overview of leading global vendors and emerging Chinese companies with their technical specifications and competitive positioning.

AI chipsASICChinese semiconductor
0 likes · 11 min read
What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players
Architects' Tech Alliance
Architects' Tech Alliance
Apr 3, 2025 · Artificial Intelligence

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

Training today's massive AI foundation models demands extensive GPU resources and sophisticated multi‑GPU communication, making technologies like NVLink and NVSwitch crucial for efficient distributed training, while data‑parallel and model‑parallel strategies together optimize performance across large‑scale hardware clusters.

AIDistributed TrainingGPU
0 likes · 8 min read
Why NVLink and NVSwitch Are Essential for Training Massive AI Models
AI Cyberspace
AI Cyberspace
Mar 29, 2025 · Fundamentals

Why FP32 Remains the Benchmark for Measuring AI Compute Power

This article explains scientific notation, the IEEE‑754 floating‑point standard, the structure of FP32 and FP64 numbers, and how computational power is measured using FLOPS, illustrating CPU and GPU FP32 performance calculations and why FP32 is the common benchmark for AI workloads.

CPUFP32GPU
0 likes · 17 min read
Why FP32 Remains the Benchmark for Measuring AI Compute Power
Architects' Tech Alliance
Architects' Tech Alliance
Mar 28, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin

The article traces NVIDIA’s GPU architecture evolution from the Volta era’s pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs, highlighting key innovations such as mixed‑precision support, sparsity, NVLink, and their impact on deep‑learning performance.

AI hardwareGPUNvidia
0 likes · 10 min read
Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin
Architects' Tech Alliance
Architects' Tech Alliance
Mar 27, 2025 · Industry Insights

GPU Industry Deep Dive: Market Trends, Competitive Landscape, and Future Outlook

This article provides a comprehensive analysis of the GPU industry, covering product classifications, key characteristics, market size evolution, competitive dynamics among major players such as NVIDIA, AMD, and Huawei, policy influences, and future growth projections driven by AI and high‑performance computing demands.

AI computeGPUIndustry analysis
0 likes · 14 min read
GPU Industry Deep Dive: Market Trends, Competitive Landscape, and Future Outlook
Infra Learning Club
Infra Learning Club
Mar 23, 2025 · Artificial Intelligence

Getting Started with cuda‑python and an Introduction to cuTicle

This article explains the cuda‑python ecosystem—including its core packages, installation via pip or conda, the experimental cuda.core API, a full Python‑to‑CUDA workflow with NVRTC compilation, performance comparison to C++, the covered APIs, and an overview of NVIDIA's new cuTicle programming model.

CUDAGPUNVRTC
0 likes · 11 min read
Getting Started with cuda‑python and an Introduction to cuTicle
Infra Learning Club
Infra Learning Club
Mar 22, 2025 · Artificial Intelligence

How to Write CUDA Kernels in Python – Insights from Nvidia GTC 2025

The article reviews Nvidia GTC 2025’s session on writing CUDA kernels with Python, compares tools such as Numba, CuPy, PyTorch extensions and cuda‑python, demonstrates a segmented reduction example with C++ and Python code, explains the underlying CUDA concepts, and shows how to install and use cuda‑python to simplify kernel development.

CUDACuPyGPU
0 likes · 10 min read
How to Write CUDA Kernels in Python – Insights from Nvidia GTC 2025
Tencent Technical Engineering
Tencent Technical Engineering
Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming
0 likes · 42 min read
Fundamentals of GPU Architecture and Programming
Infra Learning Club
Infra Learning Club
Mar 20, 2025 · Artificial Intelligence

How GPU Frequency, Power Consumption, and FLOPS Interrelate

The article explains the theoretical and practical relationships between GPU clock frequencies, power consumption, and FLOPS, describes key hardware metrics such as SM, memory, and video clocks, shows how to query and set these values with nvidia‑smi, and presents experiments on a Tesla P4 that reveal the non‑linear trade‑offs between performance, power, and temperature.

Clock SpeedDVFSFLOPS
0 likes · 15 min read
How GPU Frequency, Power Consumption, and FLOPS Interrelate
JD Tech
JD Tech
Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail
0 likes · 20 min read
JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications
AntTech
AntTech
Mar 19, 2025 · Artificial Intelligence

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

At HPCA 2025, two standout papers—UniNDP, a unified compilation and simulation tool for near‑DRAM processing architectures, and WarpDrive, a GPU‑based fully homomorphic encryption accelerator leveraging Tensor and CUDA cores—demonstrate significant performance gains for AI workloads and privacy‑preserving computation.

AI accelerationFully Homomorphic EncryptionGPU
0 likes · 5 min read
Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)
DataFunSummit
DataFunSummit
Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu's machine‑learning platform lead Wang Xin's presentation on the ZhiLight large‑model inference framework, covering model execution mechanisms, GPU workload analysis, pipeline and tensor parallelism, GPU architecture evolution, open‑source engine comparisons, ZhiLight's compute‑communication overlap and quantization optimizations, benchmark results, supported models, and future directions.

GPUInferenceLLM
0 likes · 13 min read
Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 9, 2025 · Cloud Computing

Deploy QwQ-32B LLM Inference on Alibaba Cloud ACS with vLLM: Step‑by‑Step Guide

This guide walks you through using Alibaba Cloud Container Compute Service (ACS) to provision GPU resources, prepare the QwQ-32B model, configure persistent storage, deploy the model with vLLM, set up OpenWebUI, verify the service, and optionally benchmark its performance, all with detailed commands and YAML examples.

ACSAlibaba CloudBenchmark
0 likes · 17 min read
Deploy QwQ-32B LLM Inference on Alibaba Cloud ACS with vLLM: Step‑by‑Step Guide
Infra Learning Club
Infra Learning Club
Mar 9, 2025 · Cloud Native

How to Fix nvidia-smi Missing GPU Process Info Inside Containers

The article explains why nvidia-smi cannot display GPU processes when run inside a container, analyzes the underlying pid‑namespace isolation and kernel‑level restrictions, and provides three practical solutions—including using hostPid, custom kernel interception modules, and the nvitop tool—plus a workaround for gpu‑operator deployments.

GPUKernel ModuleKubernetes
0 likes · 8 min read
How to Fix nvidia-smi Missing GPU Process Info Inside Containers
Infra Learning Club
Infra Learning Club
Mar 6, 2025 · Fundamentals

How GPU DVFS Boosts Efficiency: Concepts, Modeling, and Future Directions

This article explains how GPU Dynamic Voltage and Frequency Scaling (DVFS) reduces power consumption while preserving performance, describes NVIDIA GPU Boost 4.0 features, outlines a hardware‑counter‑based GPGPU power‑estimation model built with a BP‑ANN, reports sub‑5% error on benchmarks, and discusses intelligent and multi‑GPU extensions.

BP-ANNDVFSGPGPU
0 likes · 5 min read
How GPU DVFS Boosts Efficiency: Concepts, Modeling, and Future Directions
Baidu Geek Talk
Baidu Geek Talk
Mar 5, 2025 · Cloud Computing

Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets

This article provides a comprehensive technical overview of GPU cloud server design, covering data‑processing pipelines, hardware topology, NUMA considerations, PCIe and proprietary interconnects, multi‑GPU communication strategies, virtualization approaches (BCC and BBC), DPU acceleration, and future trends for scaling up and out.

GPUPerformance OptimizationVirtualization
0 likes · 27 min read
Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets
JD Retail Technology
JD Retail Technology
Mar 4, 2025 · Artificial Intelligence

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

JD Retail’s Nine‑Number Algorithm Platform delivers an end‑to‑end AI engine that unifies GPU and domestic NPU resources across a thousand‑card cluster, offering zero‑cost model migration, optimized training and inference pipelines, support for over 40 LLM and multimodal models, and proven business‑level performance that reduces dependence on overseas chips.

AIDistributed TrainingGPU
0 likes · 19 min read
JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Mar 3, 2025 · Cloud Computing

How Baidu Cloud Optimizes GPU Servers for AI Workloads

This article explains the design and implementation of GPU cloud servers, covering data processing pipelines, hardware selection, topology, interconnect technologies, virtualization, multi‑GPU communication methods, and Baidu's practical solutions for both virtualized and bare‑metal instances to boost AI inference and training performance.

AIGPUNVLink
0 likes · 29 min read
How Baidu Cloud Optimizes GPU Servers for AI Workloads
JD Tech Talk
JD Tech Talk
Mar 3, 2025 · Artificial Intelligence

AI Engine Technology Based on Domestic Chips for JD Retail

This article describes JD Retail's AI engine built on domestic NPU chips, covering challenges, heterogeneous GPU‑NPU scheduling, high‑performance training and inference engines, extensive model support, real‑world deployment cases, and future plans for large‑scale chip clusters and ecosystem development.

AIDistributed TrainingGPU
0 likes · 20 min read
AI Engine Technology Based on Domestic Chips for JD Retail
IT Services Circle
IT Services Circle
Feb 27, 2025 · Artificial Intelligence

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

DeepSeek’s OpenSourceWeek introduced FlashMLA, a GPU‑optimized MLA decoding kernel for Hopper GPUs that leverages FlashAttention and CUTLASS to dramatically improve large‑model inference performance, with early adoption showing up to 30% higher compute utilization and doubled speed in some scenarios.

DeepSeekFlashMLAGPU
0 likes · 3 min read
DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs
JavaEdge
JavaEdge
Feb 24, 2025 · Artificial Intelligence

Build a CIFAR‑10 Image Classifier with PyTorch – A Java Developer’s Guide

This tutorial walks Java developers through building, training, evaluating, and deploying a CIFAR‑10 image classifier using PyTorch, covering data loading, preprocessing, network definition, loss and optimizer setup, GPU acceleration, model saving, and per‑class accuracy analysis.

CIFAR-10Deep LearningGPU
0 likes · 18 min read
Build a CIFAR‑10 Image Classifier with PyTorch – A Java Developer’s Guide
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 24, 2025 · Artificial Intelligence

Unlock Data+AI Fusion: Fine‑Tune Multimodal Models on DataWorks with GPU‑Ready Notebooks

This tutorial shows how to use Alibaba Cloud DataWorks' serverless GPU resource groups together with the open‑source LLaMA‑Factory framework to fine‑tune the Qwen2‑VL‑2B multimodal model for tourism‑domain Q&A, covering environment setup, dataset preparation, parameter configuration, training, and interactive inference.

DataWorksGPULLaMA-Factory
0 likes · 10 min read
Unlock Data+AI Fusion: Fine‑Tune Multimodal Models on DataWorks with GPU‑Ready Notebooks
Infra Learning Club
Infra Learning Club
Feb 23, 2025 · Fundamentals

How to Dynamically Decompress CUDA Fatbin Files Compressed by NVCC

This article explains why enabling NVCC's --fatbin-options -compress-all breaks remote GPU calls, describes the fatbin file layout, shows how to extract and analyze the binary with objcopy, and provides a step‑by‑step implementation of a decompression routine for both ELF and PTX sections.

Binary FormatCUDAGPU
0 likes · 9 min read
How to Dynamically Decompress CUDA Fatbin Files Compressed by NVCC
Infra Learning Club
Infra Learning Club
Feb 22, 2025 · Fundamentals

Understanding NVCC Compilation: A Step‑by‑Step Technical Guide

This article walks through the NVCC compilation pipeline, explaining how CUDA source files are transformed into host and device binaries, detailing file extensions, compilation stages, command‑line options, intermediate artifacts, and the role of registration functions such as __nv_cudaEntityRegisterCallback and __sti____cudaRegisterAll.

CUDACompilationGPU
0 likes · 12 min read
Understanding NVCC Compilation: A Step‑by‑Step Technical Guide
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 21, 2025 · Artificial Intelligence

Deploying DeepSeek R1 Model Inference on ACK Edge with Virtual Nodes and Serverless GPU

This article explains how to use Alibaba Cloud ACK Edge to manage on‑premise GPU resources and seamlessly fall back to cloud‑based ACS Serverless GPU via virtual nodes for deploying DeepSeek R1 inference, covering environment preparation, model download, storage setup, custom scheduling, and scaling strategies.

ACK@EdgeDeepSeekGPU
0 likes · 16 min read
Deploying DeepSeek R1 Model Inference on ACK Edge with Virtual Nodes and Serverless GPU
Python Programming Learning Circle
Python Programming Learning Circle
Feb 18, 2025 · Artificial Intelligence

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

This article introduces PyTorch, covering installation on CPU/GPU, basic tensor operations, automatic differentiation, building and training neural networks, data loading with DataLoader, image classification on MNIST, model deployment, and useful tips for accelerating deep‑learning workflows.

Deep LearningGPUNeural Networks
0 likes · 9 min read
Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects
Infra Learning Club
Infra Learning Club
Feb 15, 2025 · Cloud Native

Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU

This article explains how os‑criu provides transparent, OS‑level GPU checkpoint/restore, compares its performance with NVIDIA's cuda‑checkpoint, walks through building and installing the PhOS framework, demonstrates migration of a Llama2‑13b‑chat workload in Docker, and discusses current limitations and future Kubernetes integration plans.

CRIUCheckpointDocker
0 likes · 9 min read
Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU
Ops Development & AI Practice
Ops Development & AI Practice
Feb 15, 2025 · Artificial Intelligence

How to Efficiently Fine‑Tune Llama 3 on a Free Colab T4 GPU with Unsloth

This article provides a step‑by‑step, code‑rich tutorial for fine‑tuning the open‑source Llama 3 1B and 3B models on Google Colab using the Unsloth library and LoRA, covering environment setup, model loading, adapter insertion, dataset preparation, training configuration, inference, and model saving, all while keeping GPU memory usage low.

AIColabFine-tuning
0 likes · 13 min read
How to Efficiently Fine‑Tune Llama 3 on a Free Colab T4 GPU with Unsloth
Code Mala Tang
Code Mala Tang
Feb 10, 2025 · Artificial Intelligence

How Much Does It Really Cost to Run a Full‑Scale DeepSeek AI Locally?

This article breaks down the hardware and software expenses required to deploy a complete DeepSeek large‑language model on‑premises, revealing a total cost of roughly $110,000 and explaining why such an investment is prohibitive for most individual developers but may be justified for well‑funded research or corporate projects.

DeepSeekDeploymentGPU
0 likes · 4 min read
How Much Does It Really Cost to Run a Full‑Scale DeepSeek AI Locally?
21CTO
21CTO
Feb 8, 2025 · Artificial Intelligence

Can Java Overtake Python in AI? Insights from the 2025 Azul Report

A recent Azul Systems study suggests that Java may surpass Python in enterprise AI development within the next 18‑36 months, highlighting Java's scalability, performance, and emerging GPU projects while acknowledging cultural and tooling advantages that still favor Python.

AIDevOpsEnterprise
0 likes · 9 min read
Can Java Overtake Python in AI? Insights from the 2025 Azul Report
Open Source Linux
Open Source Linux
Feb 7, 2025 · Operations

China's Xinchang Server Ecosystem: Market Trends, Key Players, and Future Risks

This article provides a comprehensive analysis of China's Xinchang server industry, covering the upstream component supply chain, mid‑stream manufacturers, downstream users, shipment statistics, market share evolution, competitive tiers, application demands, and the technical and ecological challenges facing domestic CPU and GPU development.

CPUChinese hardwareGPU
0 likes · 10 min read
China's Xinchang Server Ecosystem: Market Trends, Key Players, and Future Risks
Top Architect
Top Architect
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU
0 likes · 14 min read
Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide
AI Cyberspace
AI Cyberspace
Feb 5, 2025 · Fundamentals

From 2D Cards to AI Powerhouses: The Evolution of GPUs

This article traces the GPU's journey from early 2D graphics cards to modern GPGPUs powering AI and HPC, explains core hardware components, compares GPU and CPU architectures, and details the 3D rendering pipeline that underlies graphics and parallel computation.

GPUGraphics Processing UnitRendering Pipeline
0 likes · 10 min read
From 2D Cards to AI Powerhouses: The Evolution of GPUs
Code Mala Tang
Code Mala Tang
Feb 2, 2025 · Artificial Intelligence

How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide

This guide walks you through the hardware and software prerequisites, Docker-based installation, environment configuration, model fine‑tuning, IDE integration, maintenance, and troubleshooting for running the DeepSeek AI programming assistant entirely on your own machine.

AI coding assistantDeepSeekDocker
0 likes · 12 min read
How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide
Infra Learning Club
Infra Learning Club
Jan 24, 2025 · Fundamentals

Inside NVCC: How CUDA Code Is Compiled and Linked

The article dissects NVCC’s compilation pipeline, showing how internal registration functions from host_runtime.h are injected into the host binary, how a simple CUDA demo is processed with --dryrun, and how the generated fatbin, PTX, and cubin files are linked and registered for GPU execution.

CUDACompilationFatBinary
0 likes · 10 min read
Inside NVCC: How CUDA Code Is Compiled and Linked
Architects' Tech Alliance
Architects' Tech Alliance
Jan 23, 2025 · Game Development

GPU Architecture and Rendering Pipeline Overview

This article provides a comprehensive overview of modern GPU architecture, covering components such as SMs, GPCs, memory hierarchy, unified shader architecture, SIMT execution, warp scheduling, and compares IMR, TBR, and TBDR rendering pipelines while offering practical optimization techniques for developers.

GPUGraphicsRendering
0 likes · 27 min read
GPU Architecture and Rendering Pipeline Overview
Python Programming Learning Circle
Python Programming Learning Circle
Jan 15, 2025 · Fundamentals

Python Performance Optimization Tools and Libraries

This article introduces a comprehensive set of Python performance‑enhancing tools and libraries—including NumPy, SciPy, PyPy, Cython, Numba, GPU‑based solutions, and various wrappers—explaining how they accelerate code execution, reduce memory usage, and enable efficient single‑ and multi‑processor programming.

CompilationGPUJIT
0 likes · 8 min read
Python Performance Optimization Tools and Libraries
Architects' Tech Alliance
Architects' Tech Alliance
Jan 14, 2025 · Industry Insights

AI Server Market 2024: Growth Trends, Types, and Key Challenges

The 2024 AI server market is booming with global shipments surpassing 1.2 million units in 2023 and projected to reach 1.67 million in 2024, driven by rapid growth in China’s AI compute capacity, distinct training and inference server designs, and facing challenges in GPU quality, high‑speed interconnects, and cooling solutions.

2024AI hardwareAI servers
0 likes · 5 min read
AI Server Market 2024: Growth Trends, Types, and Key Challenges
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 14, 2025 · Cloud Native

Managing Distributed ECS Resources with ACK Edge and Kubernetes

This guide explains how to use Alibaba Cloud's ACK Edge to create a secure, high‑availability Kubernetes cluster that unifies management and scheduling of ECS instances across multiple VPCs, regions, and accounts, with detailed scenarios, advantages, step‑by‑step procedures, and sample YAML deployments.

ACK@EdgeDaemonSetDistributed Resources
0 likes · 8 min read
Managing Distributed ECS Resources with ACK Edge and Kubernetes
Java Tech Enthusiast
Java Tech Enthusiast
Jan 9, 2025 · Cloud Native

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

This guide walks through installing the NVIDIA container toolkit, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in Kubernetes, labeling GPU nodes, and running a GPU‑accelerated FFmpeg pod to confirm successful GPU integration.

Container ToolkitDockerGPU
0 likes · 12 min read
Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes
Liangxu Linux
Liangxu Linux
Jan 8, 2025 · Cloud Native

Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit

This guide walks through checking system and software environments, installing and configuring the NVIDIA Docker plugin, verifying GPU access in Docker containers, deploying the NVIDIA device plugin on a Kubernetes cluster, creating GPU‑enabled pods, and troubleshooting common issues, all with concrete commands and configuration examples.

Container ToolkitGPUKubernetes
0 likes · 12 min read
Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit
21CTO
21CTO
Jan 7, 2025 · Artificial Intelligence

Nvidia Reveals RTX 50 GPUs, Thor Auto Chip, and AI Supercomputer at CES 2025

At CES 2025, Nvidia CEO Jensen Huang announced the RTX 50 series GPUs built on the Blackwell architecture, the Thor automotive processor, the Project Digits personal AI supercomputer, new AI agents and robotics initiatives, detailing pricing, performance specs, and partnerships across automotive and AI ecosystems.

CES 2025GPUNvidia
0 likes · 10 min read
Nvidia Reveals RTX 50 GPUs, Thor Auto Chip, and AI Supercomputer at CES 2025
Architects' Tech Alliance
Architects' Tech Alliance
Jan 6, 2025 · Industry Insights

How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains

The article provides a detailed technical analysis of Nvidia’s new GB300 and B300 GPUs, comparing their performance, memory architecture, and power consumption to previous generations, and examines how these changes affect AI inference workloads, NVL72 accelerator systems, and the supply‑chain strategies of major cloud providers.

AI inferenceGPUNvidia
0 likes · 12 min read
How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains
Infra Learning Club
Infra Learning Club
Jan 4, 2025 · Cloud Native

How GPU Devices Are Dynamically Mounted to Kubernetes Pods

This article dissects the GPUMounter project's implementation of dynamic GPU device mounting to a pod, detailing the roles of cgroups (v1 and v2) and Linux namespaces, and provides step‑by‑step command‑line examples and a CLI tool for practical use.

GPUKubernetesNamespace
0 likes · 13 min read
How GPU Devices Are Dynamically Mounted to Kubernetes Pods
Architects' Tech Alliance
Architects' Tech Alliance
Dec 29, 2024 · Industry Insights

Why Broadcom’s $1T Valuation Signals a New Era for AI ASICs

Broadcom’s market‑cap breakthrough past $1 trillion highlights its strategic push into AI ASICs, revealing how ASIC‑FPGA trade‑offs, collaborations with Google, and competition with Nvidia’s GPU ecosystem are reshaping the high‑performance computing landscape.

AI ASICBroadcomChip Design
0 likes · 13 min read
Why Broadcom’s $1T Valuation Signals a New Era for AI ASICs
DataFunSummit
DataFunSummit
Dec 28, 2024 · Artificial Intelligence

Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques

This talk presents the Ant Group team's recent work on large‑model inference memory optimization, covering GPU memory challenges, virtual memory management (VMM), the Virtual Tensor framework, LayerKV techniques, performance comparisons with Page Attention and FlashAttention, and extensive experimental results demonstrating reduced latency and higher QPS.

GPUVirtual Memoryattention
0 likes · 25 min read
Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2024 · Industry Insights

How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases

This article explains the fundamentals of GPU architecture, the need for GPU virtualization, and walks through user‑level, kernel‑level, hardware‑level, and full GPU virtualization techniques, illustrating each layer with diagrams and code examples while highlighting practical deployment scenarios.

GPUHardware accelerationSystem Architecture
0 likes · 10 min read
How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases
DataFunSummit
DataFunSummit
Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationContinuous BatchingGPU
0 likes · 20 min read
Accelerating Large Language Model Inference with the YiNian LLM Framework