Tagged articles

536 articles

Page 2 of 6

Jun 19, 2025 · Fundamentals

Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained

This comprehensive guide covers 100 essential GPU fundamentals, from basic definitions and architecture to core technologies, performance optimization, emerging trends, and industry developments, providing a complete technical foundation for graphics, AI, and high‑performance computing applications.

Deep LearningGPUGraphics Processing Unit

0 likes · 19 min read

Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained

360 Zhihui Cloud Developer

Jun 18, 2025 · Cloud Native

Unifying GPU Management Across Kubernetes Clusters with RBAC & Virtual Control Planes

This article examines how to centrally manage GPU resources across heterogeneous Kubernetes clusters using namespace‑based RBAC isolation, virtual control‑plane solutions like vcluster, and multi‑cluster tools such as Karmada, comparing their architectures, use cases, advantages, and limitations to guide enterprise‑level deployment decisions.

Cloud NativeGPUKubernetes

0 likes · 14 min read

Unifying GPU Management Across Kubernetes Clusters with RBAC & Virtual Control Planes

Architects' Tech Alliance

Jun 15, 2025 · Fundamentals

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

This comprehensive guide covers GPU definitions, evolution, core components, architectural designs, performance metrics, programming models, deep‑learning applications, comparisons with other processors, practical use cases, optimization techniques, and future trends, providing a solid foundation for anyone interested in modern graphics and compute acceleration.

Deep LearningGPUHardware

0 likes · 43 min read

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

Ops Development Stories

Jun 12, 2025 · Cloud Native

One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

This tutorial walks you through using a one‑click script to create a GPU‑enabled Kind Kubernetes cluster, evenly distribute GPU resources across nodes with nvkind, install necessary drivers and toolkits, deploy a vLLM‑served large language model, and verify its operation, all on a local or cloud environment.

AI Model DeploymentDockerGPU

0 likes · 23 min read

One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

Architects' Tech Alliance

Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU

0 likes · 6 min read

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

Network Intelligence Research Center (NIRC)

Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACUTLASSGEMM

0 likes · 6 min read

How to Build High‑Performance GEMM with NVIDIA CUTLASS

Architects' Tech Alliance

Jun 6, 2025 · Artificial Intelligence

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

This article compares NVIDIA’s China‑specific B30 and high‑end H20 GPUs, detailing their CPU/CPU architecture updates, memory technologies, architectural differences, performance metrics, power and cooling characteristics, and price positioning, to help enterprises and developers choose the most suitable accelerator for AI and deep‑learning tasks.

AI accelerationB30GPU

0 likes · 13 min read

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

Architects' Tech Alliance

Jun 5, 2025 · Artificial Intelligence

Why AI Server Market Is Shifting: Key Trends and Winners in 2024

The Chinese AI server market is booming, with GPU servers still dominant while non‑GPU accelerators surge, IDC forecasts a compound annual growth above 20% through 2028, and leading vendors such as Inspur, H3C, and emerging Ascend‑based manufacturers reshaping the competitive landscape.

AI serversASICChina

0 likes · 10 min read

Why AI Server Market Is Shifting: Key Trends and Winners in 2024

DataFunTalk

Jun 4, 2025 · Artificial Intelligence

Coupang’s Distributed Cache Architecture Accelerates AI/ML Model Training

Coupang’s AI platform replaces costly data‑copy steps with a distributed cache that automatically pulls data from a central lake, boosts GPU utilization across regions, cuts storage and operational expenses, and speeds up model training by up to 40% while simplifying deployment via Kubernetes.

AIData LakeGPU

0 likes · 9 min read

Coupang’s Distributed Cache Architecture Accelerates AI/ML Model Training

Python Programming Learning Circle

Jun 2, 2025 · Artificial Intelligence

NVIDIA Adds Native Python Support to CUDA – What It Means for Developers

NVIDIA announced at GTC 2025 that CUDA will now natively support Python, allowing developers to write GPU‑accelerated code directly in Python without C/C++ knowledge, introducing new APIs, libraries, JIT compilation, performance tools, and a tile‑based programming model that aligns with Python’s array‑centric workflow.

AICUDAGPU

0 likes · 7 min read

NVIDIA Adds Native Python Support to CUDA – What It Means for Developers

Architects' Tech Alliance

Jun 1, 2025 · Artificial Intelligence

Evolution, Industry Landscape, and Standards of Graphics GPUs

This article traces the history of graphics GPUs from their 1980s origins to modern AI and high‑performance computing roles, examines China's emerging GPU market and its challenges, and reviews the key graphics and compute standards shaping the industry today.

GPUGraphicsHardware

0 likes · 10 min read

Evolution, Industry Landscape, and Standards of Graphics GPUs

Architects' Tech Alliance

May 31, 2025 · Artificial Intelligence

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

This article explains the concepts of AI Pods and GPU clusters, compares vertical (scale‑up) and horizontal (scale‑out) expansion, describes XPU types, discusses internal and inter‑pod communication, and evaluates the benefits and drawbacks of each scaling approach along with relevant networking technologies.

AI PodsGPUInfiniBand

0 likes · 10 min read

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

Architects' Tech Alliance

May 26, 2025 · Artificial Intelligence

NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing

NVLink Fusion, unveiled at Computex 2025, extends NVIDIA’s NVLink technology to enable high‑bandwidth, low‑latency connections between CPUs and GPUs or third‑party accelerators, offering up to 900 GB/s bandwidth, flexible heterogeneous configurations, ecosystem expansion, performance gains for AI training and inference, and potential cost reductions.

AICPUData center

0 likes · 12 min read

NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing

Architects' Tech Alliance

May 23, 2025 · Artificial Intelligence

Analysis of Nvidia’s China‑Specific Cut‑Down GPUs: H20, B20, and B40

This article examines the impact of U.S. export restrictions on Nvidia’s China‑specific GPU lineup, detailing the specifications and architectural changes of the H20, B20, and B40 chips, while also discussing domestic alternatives and the broader implications for AI compute in China.

AI chipsB20B40

0 likes · 10 min read

Analysis of Nvidia’s China‑Specific Cut‑Down GPUs: H20, B20, and B40

Architects' Tech Alliance

May 20, 2025 · Industry Insights

What Do GPU Core Specs Really Mean? A Deep Dive into Modern GPU Performance

This article provides a comprehensive analysis of GPU core parameters—including compute units, memory systems, floating‑point performance, power consumption, and manufacturing process—while comparing leading international and domestic GPU products to help readers choose the right accelerator for AI, HPC, or graphics workloads.

AIBenchmarkingGPU

0 likes · 19 min read

What Do GPU Core Specs Really Mean? A Deep Dive into Modern GPU Performance

AI Frontier Lectures

May 20, 2025 · Industry Insights

How New US Geo‑Tracking Laws Could Reshape the High‑End GPU Market

A US Senate bill introduced by Senator Tom Cotton requires Nvidia, AMD, Intel and other high‑end GPU and AI processor makers to embed geolocation tracking, imposing six‑month compliance deadlines, new reporting obligations, and potentially billions of dollars in added R&D and export‑control costs.

Export ControlGPUGeo-tracking

0 likes · 8 min read

How New US Geo‑Tracking Laws Could Reshape the High‑End GPU Market

21CTO

May 15, 2025 · Artificial Intelligence

AI Updates: Tencent GPUs, Alibaba Qwen3, Anaconda Platform, Google Apigee

This roundup highlights Tencent's GPU capacity for future models, Alibaba's fully disclosed Qwen3 technical report, Anaconda's unified AI platform, Parasoft's AI‑enhanced SOAtest, and Google Cloud's GA of the Apigee API Management Operator, offering a snapshot of current AI advancements.

AIAPI ManagementGPU

0 likes · 5 min read

AI Updates: Tencent GPUs, Alibaba Qwen3, Anaconda Platform, Google Apigee

Architects' Tech Alliance

May 13, 2025 · Industry Insights

How NVIDIA Builds AI Supercomputers: From H100 to GH200 and GB200 SuperPods

This article analyzes NVIDIA's evolving AI supercomputer architectures—detailing the H100‑based 256‑GPU SuperPod, the GH200‑based 256‑GPU SuperPod with integrated Grace CPU, and the GB200‑based 576‑GPU SuperPod—examining their NVLink and InfiniBand topologies, bandwidth limits, and scalability challenges.

AIGPUHPC

0 likes · 11 min read

How NVIDIA Builds AI Supercomputers: From H100 to GH200 and GB200 SuperPods

Java Tech Enthusiast

May 9, 2025 · Industry Insights

Why NVIDIA’s Native Python Support in CUDA Could Revolutionize GPU Computing

NVIDIA announced native Python support in its CUDA toolkit, enabling developers to write GPU‑accelerated code directly in Python, detailing the new programming model, JIT‑based architecture, performance benefits, and the broader impact on AI development and the developer ecosystem.

AICUDAGPU

0 likes · 15 min read

Why NVIDIA’s Native Python Support in CUDA Could Revolutionize GPU Computing

Meituan Technology Team

May 8, 2025 · Artificial Intelligence

Building a Mixed OR+ML Inference Framework with TritonServer: Architecture, Challenges, and Solutions

The article describes how a large‑scale dispatch system was re‑engineered with NVIDIA TritonServer to unify GPU‑accelerated operations‑research kernels and deep‑learning models, detailing a three‑stage architecture (in‑process, cross‑process, cross‑node), the performance, stability and memory challenges addressed, and future plans for heterogeneous GPU scaling.

GPUInferencePerformance Optimization

0 likes · 11 min read

Building a Mixed OR+ML Inference Framework with TritonServer: Architecture, Challenges, and Solutions

Architect's Alchemy Furnace

May 6, 2025 · Operations

Master Ollama Deployment: Optimize Environment Variables for Peak Performance

This guide walks you through cross‑platform environment variable configuration, Docker containerization, GPU resource strategies, concurrency tuning, and security hardening for Ollama, providing practical code snippets and best‑practice tables to unleash its full potential in development and production.

DeploymentEnvironment VariablesGPU

0 likes · 14 min read

Master Ollama Deployment: Optimize Environment Variables for Peak Performance

Alibaba Cloud Infrastructure

Apr 30, 2025 · Cloud Native

Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration

This guide explains how to prepare, deploy, and verify the Qwen3‑8B large language model on an Alibaba Cloud Container Service for Kubernetes (ACK) cluster using ACS GPU resources, covering prerequisites, model download, storage setup, Kubernetes manifests, and testing the inference service.

ACKACSCloud Native

0 likes · 8 min read

Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration

Architects' Tech Alliance

Apr 29, 2025 · Industry Insights

Next-Gen Server Architecture: CPUs, GPUs, Memory, and Certification Insights

This article provides a comprehensive analysis of modern server architecture, covering the evolution from CISC to RISC, the rise of heterogeneous computing with GPUs and accelerators, diverse form factors, core component technologies, reliability mechanisms, performance benchmarking, certification standards, and emerging trends such as liquid cooling and AI‑native designs.

CPUData centerGPU

0 likes · 11 min read

Next-Gen Server Architecture: CPUs, GPUs, Memory, and Certification Insights

Liangxu Linux

Apr 23, 2025 · Fundamentals

Which GPU Wins on Linux: AMD’s Plug‑and‑Play Simplicity vs NVIDIA’s Performance Edge

This article objectively compares AMD and NVIDIA graphics cards for Linux users, covering out‑of‑the‑box driver support, Wayland compatibility, gaming performance, machine‑learning capabilities, and cost‑effectiveness to help readers choose the best GPU for their needs.

AMDDriver SupportGPU

0 likes · 9 min read

Which GPU Wins on Linux: AMD’s Plug‑and‑Play Simplicity vs NVIDIA’s Performance Edge

Architects' Tech Alliance

Apr 13, 2025 · Industry Insights

Which NVIDIA GPU Wins for AI? Deep Dive into RTX & A‑Series Performance and Power

This article presents a detailed comparison of major NVIDIA GPUs—including RTX 4090, RTX 4090 D, RTX 3090, A10, A40, A100, and H100—covering memory size, bandwidth, Tensor BF16/FP16/FP32 throughput, FP16/FP32 performance, power draw and release dates, and explains how these specs affect AI workload efficiency.

AI workloadsGPUIndustry analysis

0 likes · 9 min read

Which NVIDIA GPU Wins for AI? Deep Dive into RTX & A‑Series Performance and Power

Architects' Tech Alliance

Apr 10, 2025 · Artificial Intelligence

Which NVIDIA GPU Is Right for Your AI Compute Center? A Deep Dive into A100, H100, A800, H800, and H20

This article analyzes NVIDIA's A100, H100, A800, H800, and H20 GPUs, compares their architectures, performance, and pricing, and provides a step‑by‑step guide for building a private AI compute center tailored to training, inference, and high‑performance computing workloads.

A100AI trainingGPU

0 likes · 11 min read

Which NVIDIA GPU Is Right for Your AI Compute Center? A Deep Dive into A100, H100, A800, H800, and H20

Alibaba Cloud Infrastructure

Apr 9, 2025 · Cloud Computing

Multi-Region Serverless Compute Scheduling with Alibaba Cloud ACK One Registered Cluster

This guide explains how Alibaba Cloud's ACK One registered cluster provides multi‑region serverless GPU compute scheduling, addressing AI workload elasticity by using region‑specific labels, ResourcePolicy, and the ack‑co‑scheduler to automatically balance resources across regions.

ACK OneGPUKubernetes

0 likes · 10 min read

Multi-Region Serverless Compute Scheduling with Alibaba Cloud ACK One Registered Cluster

Architects' Tech Alliance

Apr 8, 2025 · Industry Insights

What Drives China’s Xinchuang Server Market? 2024‑2026 Trends, Risks, and Competitive Landscape

This article analyzes the Xinchuang hardware ecosystem and server industry from 2024 to 2026, covering supply‑chain structure, shipment growth, market share, competitive tiers, downstream application demands, and the technical and ecological challenges that hinder full domestic substitution.

CPUChinaGPU

0 likes · 12 min read

What Drives China’s Xinchuang Server Market? 2024‑2026 Trends, Risks, and Competitive Landscape

AI Frontier Lectures

Apr 8, 2025 · Industry Insights

Nvidia’s GPU Names Explained: Ampere, Hopper, Blackwell, Rubin, Feynman

At the recent GTC conference Nvidia unveiled its roadmap of AI‑focused GPUs—Ampere, Hopper, Blackwell, Rubin and the upcoming Feynman—each named after a pioneering scientist, and this article explores the historical contributions of André‑Marie Ampère, Grace Hopper, David Blackwell, Vera Rubin and Richard Feynman, linking their legacies to the architectures’ innovations.

AIGPUNvidia

0 likes · 10 min read

Nvidia’s GPU Names Explained: Ampere, Hopper, Blackwell, Rubin, Feynman

Architects' Tech Alliance

Apr 6, 2025 · Fundamentals

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

As AI models grow to trillion‑parameter scales, training them demands massive GPU clusters whose performance is increasingly limited by network bandwidth; this article examines why traditional PCIe interconnects become bottlenecks and how NVIDIA's NVLink and NVSwitch technologies dramatically improve multi‑GPU communication and overall system efficiency.

AI trainingGPUHigh‑performance computing

0 likes · 12 min read

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

Architects' Tech Alliance

Apr 4, 2025 · Industry Insights

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

This article analyzes the AI compute chip ecosystem, covering GPU, FPGA, and ASIC categories, market share projections, key performance metrics such as TOPS, power and area, and provides a detailed overview of leading global vendors and emerging Chinese companies with their technical specifications and competitive positioning.

AI chipsASICChinese semiconductor

0 likes · 11 min read

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

Python Programming Learning Circle

Apr 3, 2025 · Artificial Intelligence

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

This article explains how to dramatically speed up PyTorch model training using code optimizations, mixed‑precision, torch.compile, distributed data parallelism, and DeepSpeed, presenting benchmark results that show up to 11.5× acceleration on multiple GPUs while maintaining high accuracy.

Deep LearningDeepSpeedDistributed Training

0 likes · 6 min read

Accelerating PyTorch Model Training: Techniques, Benchmarks, and Code

Architects' Tech Alliance

Apr 3, 2025 · Artificial Intelligence

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

Training today's massive AI foundation models demands extensive GPU resources and sophisticated multi‑GPU communication, making technologies like NVLink and NVSwitch crucial for efficient distributed training, while data‑parallel and model‑parallel strategies together optimize performance across large‑scale hardware clusters.

AIDistributed TrainingGPU

0 likes · 8 min read

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

360 Zhihui Cloud Developer

Apr 1, 2025 · Artificial Intelligence

DeepGEMM vs Cutlass vs Triton: Which GPU GEMM Library Delivers the Best FP8 Performance?

This article presents a comprehensive benchmark of DeepGEMM, Cutlass, and Triton on NVIDIA H20 and H800 GPUs, analyzing TFLOPS, bandwidth, latency, and speedup across various matrix sizes, and concludes which library is optimal for different workload scenarios.

BenchmarkCUDADeepGEMM

0 likes · 15 min read

DeepGEMM vs Cutlass vs Triton: Which GPU GEMM Library Delivers the Best FP8 Performance?

AI Cyberspace

Mar 29, 2025 · Fundamentals

Why FP32 Remains the Benchmark for Measuring AI Compute Power

This article explains scientific notation, the IEEE‑754 floating‑point standard, the structure of FP32 and FP64 numbers, and how computational power is measured using FLOPS, illustrating CPU and GPU FP32 performance calculations and why FP32 is the common benchmark for AI workloads.

CPUFP32GPU

0 likes · 17 min read

Why FP32 Remains the Benchmark for Measuring AI Compute Power

Architects' Tech Alliance

Mar 28, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin

The article traces NVIDIA’s GPU architecture evolution from the Volta era’s pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs, highlighting key innovations such as mixed‑precision support, sparsity, NVLink, and their impact on deep‑learning performance.

AI hardwareGPUNvidia

0 likes · 10 min read

Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin

Architects' Tech Alliance

Mar 27, 2025 · Industry Insights

GPU Industry Deep Dive: Market Trends, Competitive Landscape, and Future Outlook

This article provides a comprehensive analysis of the GPU industry, covering product classifications, key characteristics, market size evolution, competitive dynamics among major players such as NVIDIA, AMD, and Huawei, policy influences, and future growth projections driven by AI and high‑performance computing demands.

AI computeGPUIndustry analysis

0 likes · 14 min read

GPU Industry Deep Dive: Market Trends, Competitive Landscape, and Future Outlook

Architects' Tech Alliance

Mar 27, 2025 · Artificial Intelligence

What Makes AI Chips Different? A Deep Dive into Training and Inference Processors

This article explains the rise of AI‑specific processors, defines AI chips, compares their architectures, and examines the distinct requirements of training versus inference chips while outlining the main technology routes (GPU, FPGA, ASIC) and future outlook.

AI chipsASICDSA

0 likes · 9 min read

What Makes AI Chips Different? A Deep Dive into Training and Inference Processors

Infra Learning Club

Mar 23, 2025 · Artificial Intelligence

Getting Started with cuda‑python and an Introduction to cuTicle

This article explains the cuda‑python ecosystem—including its core packages, installation via pip or conda, the experimental cuda.core API, a full Python‑to‑CUDA workflow with NVRTC compilation, performance comparison to C++, the covered APIs, and an overview of NVIDIA's new cuTicle programming model.

CUDAGPUNVRTC

0 likes · 11 min read

Getting Started with cuda‑python and an Introduction to cuTicle

Infra Learning Club

Mar 22, 2025 · Artificial Intelligence

How to Write CUDA Kernels in Python – Insights from Nvidia GTC 2025

The article reviews Nvidia GTC 2025’s session on writing CUDA kernels with Python, compares tools such as Numba, CuPy, PyTorch extensions and cuda‑python, demonstrates a segmented reduction example with C++ and Python code, explains the underlying CUDA concepts, and shows how to install and use cuda‑python to simplify kernel development.

CUDACuPyGPU

0 likes · 10 min read

How to Write CUDA Kernels in Python – Insights from Nvidia GTC 2025

Tencent Technical Engineering

Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming

0 likes · 42 min read

Fundamentals of GPU Architecture and Programming

Infra Learning Club

Mar 20, 2025 · Artificial Intelligence

How GPU Frequency, Power Consumption, and FLOPS Interrelate

The article explains the theoretical and practical relationships between GPU clock frequencies, power consumption, and FLOPS, describes key hardware metrics such as SM, memory, and video clocks, shows how to query and set these values with nvidia‑smi, and presents experiments on a Tesla P4 that reveal the non‑linear trade‑offs between performance, power, and temperature.

Clock SpeedDVFSFLOPS

0 likes · 15 min read

How GPU Frequency, Power Consumption, and FLOPS Interrelate

JD Tech

Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail

0 likes · 20 min read

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

AntTech

Mar 19, 2025 · Artificial Intelligence

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

At HPCA 2025, two standout papers—UniNDP, a unified compilation and simulation tool for near‑DRAM processing architectures, and WarpDrive, a GPU‑based fully homomorphic encryption accelerator leveraging Tensor and CUDA cores—demonstrate significant performance gains for AI workloads and privacy‑preserving computation.

AI accelerationFully Homomorphic EncryptionGPU

0 likes · 5 min read

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

Architects' Tech Alliance

Mar 19, 2025 · Industry Insights

What Drives Nvidia’s AI Dominance and How Huawei’s Ascend Chips Compete

This article analyzes Nvidia’s evolution from a graphics pioneer to an AI hardware leader and examines Huawei’s Ascend AI processor roadmap, detailing technical specifications, ecosystem strategies, recent product releases, and the potential impact on related technology stocks.

AI chipsAI hardwareAscend

0 likes · 6 min read

What Drives Nvidia’s AI Dominance and How Huawei’s Ascend Chips Compete

Architects' Tech Alliance

Mar 17, 2025 · Industry Insights

DeepSeek Integrated Machines: 52 Models, Specs, Prices & Use Cases

This article compiles a market overview of 52 DeepSeek integrated machines, detailing GPU chips, price ranges from tens of thousands to millions, major Chinese cloud vendors, and diverse application scenarios such as intelligent customer service, data processing, and smart governance.

AI hardwareDeepSeekGPU

0 likes · 3 min read

DeepSeek Integrated Machines: 52 Models, Specs, Prices & Use Cases

MaGe Linux Operations

Mar 16, 2025 · Cloud Native

How to Install NVIDIA Docker Plugin and Enable GPU Access in Kubernetes

This guide walks through checking the system environment, installing the NVIDIA Docker plugin, configuring Docker to use the NVIDIA runtime, verifying GPU access with Docker, deploying the NVIDIA device plugin on a Kubernetes cluster, and running GPU‑accelerated workloads in pods.

Container ToolkitDockerGPU

0 likes · 14 min read

How to Install NVIDIA Docker Plugin and Enable GPU Access in Kubernetes

DataFunSummit

Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu's machine‑learning platform lead Wang Xin's presentation on the ZhiLight large‑model inference framework, covering model execution mechanisms, GPU workload analysis, pipeline and tensor parallelism, GPU architecture evolution, open‑source engine comparisons, ZhiLight's compute‑communication overlap and quantization optimizations, benchmark results, supported models, and future directions.

GPUInferenceLLM

0 likes · 13 min read

Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

Cognitive Technology Team

Mar 11, 2025 · Artificial Intelligence

Deploying DeepSeek R1:7b Model Locally with Ollama and Building AI Applications Using Dify

This tutorial explains how to set up Ollama for CPU or GPU environments, run the DeepSeek R1:7b large language model, and use the open‑source Dify platform to create and deploy a custom AI application, providing step‑by‑step commands and configuration details.

AIDeepSeekDify

0 likes · 8 min read

Deploying DeepSeek R1:7b Model Locally with Ollama and Building AI Applications Using Dify

Alibaba Cloud Infrastructure

Mar 9, 2025 · Cloud Computing

Deploy QwQ-32B LLM Inference on Alibaba Cloud ACS with vLLM: Step‑by‑Step Guide

This guide walks you through using Alibaba Cloud Container Compute Service (ACS) to provision GPU resources, prepare the QwQ-32B model, configure persistent storage, deploy the model with vLLM, set up OpenWebUI, verify the service, and optionally benchmark its performance, all with detailed commands and YAML examples.

ACSAlibaba CloudBenchmark

0 likes · 17 min read

Deploy QwQ-32B LLM Inference on Alibaba Cloud ACS with vLLM: Step‑by‑Step Guide

Infra Learning Club

Mar 9, 2025 · Cloud Native

How to Fix nvidia-smi Missing GPU Process Info Inside Containers

The article explains why nvidia-smi cannot display GPU processes when run inside a container, analyzes the underlying pid‑namespace isolation and kernel‑level restrictions, and provides three practical solutions—including using hostPid, custom kernel interception modules, and the nvitop tool—plus a workaround for gpu‑operator deployments.

GPUKernel ModuleKubernetes

0 likes · 8 min read

How to Fix nvidia-smi Missing GPU Process Info Inside Containers

Infra Learning Club

Mar 6, 2025 · Fundamentals

How GPU DVFS Boosts Efficiency: Concepts, Modeling, and Future Directions

This article explains how GPU Dynamic Voltage and Frequency Scaling (DVFS) reduces power consumption while preserving performance, describes NVIDIA GPU Boost 4.0 features, outlines a hardware‑counter‑based GPGPU power‑estimation model built with a BP‑ANN, reports sub‑5% error on benchmarks, and discusses intelligent and multi‑GPU extensions.

BP-ANNDVFSGPGPU

0 likes · 5 min read

How GPU DVFS Boosts Efficiency: Concepts, Modeling, and Future Directions

Baidu Geek Talk

Mar 5, 2025 · Cloud Computing

Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets

This article provides a comprehensive technical overview of GPU cloud server design, covering data‑processing pipelines, hardware topology, NUMA considerations, PCIe and proprietary interconnects, multi‑GPU communication strategies, virtualization approaches (BCC and BBC), DPU acceleration, and future trends for scaling up and out.

GPUPerformance OptimizationVirtualization

0 likes · 27 min read

Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets

JD Retail Technology

Mar 4, 2025 · Artificial Intelligence

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

JD Retail’s Nine‑Number Algorithm Platform delivers an end‑to‑end AI engine that unifies GPU and domestic NPU resources across a thousand‑card cluster, offering zero‑cost model migration, optimized training and inference pipelines, support for over 40 LLM and multimodal models, and proven business‑level performance that reduces dependence on overseas chips.

AIDistributed TrainingGPU

0 likes · 19 min read

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

Baidu Intelligent Cloud Tech Hub

Mar 3, 2025 · Cloud Computing

How Baidu Cloud Optimizes GPU Servers for AI Workloads

This article explains the design and implementation of GPU cloud servers, covering data processing pipelines, hardware selection, topology, interconnect technologies, virtualization, multi‑GPU communication methods, and Baidu's practical solutions for both virtualized and bare‑metal instances to boost AI inference and training performance.

AIGPUNVLink

0 likes · 29 min read

How Baidu Cloud Optimizes GPU Servers for AI Workloads

IT Services Circle

Mar 3, 2025 · Fundamentals

AMD RX 9070 and RX 9070 XT: Specifications, Performance Benchmarks, AI Capabilities, and Pricing

The article reviews AMD's newly announced RX 9070 and RX 9070 XT graphics cards, detailing their 4 nm RDNA 4 architecture, core specifications, gaming performance gains over the RX 7900 GRE, AI workload improvements, FSR 4 enhancements, and launch pricing compared with NVIDIA's RTX 50 series.

AIAMDBenchmark

0 likes · 6 min read

AMD RX 9070 and RX 9070 XT: Specifications, Performance Benchmarks, AI Capabilities, and Pricing

JD Tech Talk

Mar 3, 2025 · Artificial Intelligence

AI Engine Technology Based on Domestic Chips for JD Retail

This article describes JD Retail's AI engine built on domestic NPU chips, covering challenges, heterogeneous GPU‑NPU scheduling, high‑performance training and inference engines, extensive model support, real‑world deployment cases, and future plans for large‑scale chip clusters and ecosystem development.

AIDistributed TrainingGPU

0 likes · 20 min read

AI Engine Technology Based on Domestic Chips for JD Retail

Java Architect Essentials

Mar 2, 2025 · Artificial Intelligence

Zero‑Code Local Deployment of DeepSeek LLM on Consumer GPUs Using Ollama

This guide explains why DeepSeek is a compelling GPT‑4‑level alternative, provides hardware recommendations for various model sizes, and walks through a three‑step Windows deployment using Ollama, including installation, environment configuration, model download, performance tuning, and common troubleshooting tips.

AIDeepSeekGPU

0 likes · 8 min read

Zero‑Code Local Deployment of DeepSeek LLM on Consumer GPUs Using Ollama

Architects' Tech Alliance

Feb 28, 2025 · Industry Insights

Why Rubin288’s Orthogonal CLOS Architecture Beats Traditional Designs

The article analyzes NVIDIA's Rubin288 high‑density GPU cabinet, comparing its orthogonal CLOS architecture with the older non‑orthogonal designs, and explains how the new layout improves reliability, bandwidth, scalability, and cooling for modern data‑center HPC deployments.

CLOSDataCenterGPU

0 likes · 10 min read

Why Rubin288’s Orthogonal CLOS Architecture Beats Traditional Designs

IT Services Circle

Feb 27, 2025 · Artificial Intelligence

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

DeepSeek’s OpenSourceWeek introduced FlashMLA, a GPU‑optimized MLA decoding kernel for Hopper GPUs that leverages FlashAttention and CUTLASS to dramatically improve large‑model inference performance, with early adoption showing up to 30% higher compute utilization and doubled speed in some scenarios.

DeepSeekFlashMLAGPU

0 likes · 3 min read

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

JavaEdge

Feb 24, 2025 · Artificial Intelligence

Build a CIFAR‑10 Image Classifier with PyTorch – A Java Developer’s Guide

This tutorial walks Java developers through building, training, evaluating, and deploying a CIFAR‑10 image classifier using PyTorch, covering data loading, preprocessing, network definition, loss and optimizer setup, GPU acceleration, model saving, and per‑class accuracy analysis.

CIFAR-10Deep LearningGPU

0 likes · 18 min read

Build a CIFAR‑10 Image Classifier with PyTorch – A Java Developer’s Guide

Alibaba Cloud Big Data AI Platform

Feb 24, 2025 · Artificial Intelligence

Unlock Data+AI Fusion: Fine‑Tune Multimodal Models on DataWorks with GPU‑Ready Notebooks

This tutorial shows how to use Alibaba Cloud DataWorks' serverless GPU resource groups together with the open‑source LLaMA‑Factory framework to fine‑tune the Qwen2‑VL‑2B multimodal model for tourism‑domain Q&A, covering environment setup, dataset preparation, parameter configuration, training, and interactive inference.

DataWorksGPULLaMA-Factory

0 likes · 10 min read

Unlock Data+AI Fusion: Fine‑Tune Multimodal Models on DataWorks with GPU‑Ready Notebooks

Infra Learning Club

Feb 23, 2025 · Fundamentals

How to Dynamically Decompress CUDA Fatbin Files Compressed by NVCC

This article explains why enabling NVCC's --fatbin-options -compress-all breaks remote GPU calls, describes the fatbin file layout, shows how to extract and analyze the binary with objcopy, and provides a step‑by‑step implementation of a decompression routine for both ELF and PTX sections.

Binary FormatCUDAGPU

0 likes · 9 min read

How to Dynamically Decompress CUDA Fatbin Files Compressed by NVCC

Infra Learning Club

Feb 22, 2025 · Fundamentals

Understanding NVCC Compilation: A Step‑by‑Step Technical Guide

This article walks through the NVCC compilation pipeline, explaining how CUDA source files are transformed into host and device binaries, detailing file extensions, compilation stages, command‑line options, intermediate artifacts, and the role of registration functions such as __nv_cudaEntityRegisterCallback and __sti____cudaRegisterAll.

CUDACompilationGPU

0 likes · 12 min read

Understanding NVCC Compilation: A Step‑by‑Step Technical Guide

Alibaba Cloud Infrastructure

Feb 21, 2025 · Artificial Intelligence

Deploying DeepSeek R1 Model Inference on ACK Edge with Virtual Nodes and Serverless GPU

This article explains how to use Alibaba Cloud ACK Edge to manage on‑premise GPU resources and seamlessly fall back to cloud‑based ACS Serverless GPU via virtual nodes for deploying DeepSeek R1 inference, covering environment preparation, model download, storage setup, custom scheduling, and scaling strategies.

ACK@EdgeDeepSeekGPU

0 likes · 16 min read

Deploying DeepSeek R1 Model Inference on ACK Edge with Virtual Nodes and Serverless GPU

Alibaba Cloud Infrastructure

Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10

This guide explains how to deploy the DeepSeek‑R1 large language model on a Knative platform using an A10 GPU, covering preparation, service creation with appropriate annotations, YAML configuration, verification via curl, custom domain setup, and optional personal AI assistant deployment.

AIDeepSeekDeployment

0 likes · 8 min read

Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10

Python Programming Learning Circle

Feb 18, 2025 · Artificial Intelligence

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

This article introduces PyTorch, covering installation on CPU/GPU, basic tensor operations, automatic differentiation, building and training neural networks, data loading with DataLoader, image classification on MNIST, model deployment, and useful tips for accelerating deep‑learning workflows.

Deep LearningGPUNeural Networks

0 likes · 9 min read

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

Infra Learning Club

Feb 15, 2025 · Cloud Native

Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU

This article explains how os‑criu provides transparent, OS‑level GPU checkpoint/restore, compares its performance with NVIDIA's cuda‑checkpoint, walks through building and installing the PhOS framework, demonstrates migration of a Llama2‑13b‑chat workload in Docker, and discusses current limitations and future Kubernetes integration plans.

CRIUCheckpointDocker

0 likes · 9 min read

Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU

Ops Development & AI Practice

Feb 15, 2025 · Artificial Intelligence

How to Efficiently Fine‑Tune Llama 3 on a Free Colab T4 GPU with Unsloth

This article provides a step‑by‑step, code‑rich tutorial for fine‑tuning the open‑source Llama 3 1B and 3B models on Google Colab using the Unsloth library and LoRA, covering environment setup, model loading, adapter insertion, dataset preparation, training configuration, inference, and model saving, all while keeping GPU memory usage low.

AIColabFine-tuning

0 likes · 13 min read

How to Efficiently Fine‑Tune Llama 3 on a Free Colab T4 GPU with Unsloth

Architects' Tech Alliance

Feb 15, 2025 · Industry Insights

Choosing the Right NVIDIA GPU for AI: A100, H100, A800, H800 & H20 Explained

This article provides a detailed technical analysis of NVIDIA's A100, H100, A800, H800 and H20 GPUs, compares their architectures, performance and cost, and offers step‑by‑step guidance on building a private AI compute center, selecting hardware, software stacks and budgeting for different workloads.

AI trainingGPUNvidia

0 likes · 11 min read

Choosing the Right NVIDIA GPU for AI: A100, H100, A800, H800 & H20 Explained

Alibaba Cloud Infrastructure

Feb 12, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 Distilled Qwen‑32B‑FP8 Model on Alibaba Cloud GPU Instances with Docker and OpenWebUI

This guide explains how to prepare an Alibaba Cloud GPU instance, install Docker and NVIDIA tools, pull or build a container image, and run the FP8‑quantized DeepSeek‑R1‑Distill‑Qwen‑32B model using vLLM and OpenWebUI for both offline and online inference.

DeepSeekFP8 quantizationGPU

0 likes · 18 min read

Deploying DeepSeek‑R1 Distilled Qwen‑32B‑FP8 Model on Alibaba Cloud GPU Instances with Docker and OpenWebUI

Code Mala Tang

Feb 10, 2025 · Artificial Intelligence

How Much Does It Really Cost to Run a Full‑Scale DeepSeek AI Locally?

This article breaks down the hardware and software expenses required to deploy a complete DeepSeek large‑language model on‑premises, revealing a total cost of roughly $110,000 and explaining why such an investment is prohibitive for most individual developers but may be justified for well‑funded research or corporate projects.

DeepSeekDeploymentGPU

0 likes · 4 min read

How Much Does It Really Cost to Run a Full‑Scale DeepSeek AI Locally?

JD Cloud Developers

Feb 10, 2025 · Artificial Intelligence

How to Deploy DeepSeek LLM Locally on JD Cloud GPU with Ollama and Chatbox

Learn step‑by‑step how to prepare a JD Cloud GPU instance, install GPU drivers, deploy Ollama, run DeepSeek‑R1 models, configure graphical clients like Chatbox on Windows and macOS, and optionally feed local data using AnythingLLM to build an offline knowledge base.

AnythingLLMChatboxDeepSeek

0 likes · 19 min read

How to Deploy DeepSeek LLM Locally on JD Cloud GPU with Ollama and Chatbox

21CTO

Feb 8, 2025 · Artificial Intelligence

Can Java Overtake Python in AI? Insights from the 2025 Azul Report

A recent Azul Systems study suggests that Java may surpass Python in enterprise AI development within the next 18‑36 months, highlighting Java's scalability, performance, and emerging GPU projects while acknowledging cultural and tooling advantages that still favor Python.

AIDevOpsEnterprise

0 likes · 9 min read

Can Java Overtake Python in AI? Insights from the 2025 Azul Report

Alibaba Cloud Infrastructure

Feb 8, 2025 · Artificial Intelligence

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

This guide explains how to deploy a production‑ready DeepSeek‑R1 inference service on Alibaba Cloud ACK using KServe, covering model preparation, storage configuration, service deployment, observability, autoscaling, model acceleration, gray‑release and GPU‑shared inference.

DeepSeekGPUInference

0 likes · 13 min read

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

Full-Stack DevOps & Kubernetes

Feb 8, 2025 · Artificial Intelligence

Deploy DeepSeek‑R1 on Tencent Cloud with Ollama: A Complete Step‑by‑Step Guide

This guide walks you through preparing a Tencent Cloud account, creating a Cloud Studio workspace, installing Ollama, downloading and running the DeepSeek‑R1 large language model, interacting via terminal or API, and managing resources and model versions.

AI Model DeploymentAPIDeepSeek

0 likes · 8 min read

Deploy DeepSeek‑R1 on Tencent Cloud with Ollama: A Complete Step‑by‑Step Guide

Open Source Linux

Feb 7, 2025 · Operations

China's Xinchang Server Ecosystem: Market Trends, Key Players, and Future Risks

This article provides a comprehensive analysis of China's Xinchang server industry, covering the upstream component supply chain, mid‑stream manufacturers, downstream users, shipment statistics, market share evolution, competitive tiers, application demands, and the technical and ecological challenges facing domestic CPU and GPU development.

CPUChinese hardwareGPU

0 likes · 10 min read

China's Xinchang Server Ecosystem: Market Trends, Key Players, and Future Risks

Architecture Digest

Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to deploy the full 671B DeepSeek R1 model on local hardware using Ollama, leveraging dynamic quantization to shrink model size, detailing hardware requirements, step‑by‑step installation, configuration, performance observations, and practical recommendations.

DeepSeekDynamic QuantizationGPU

0 likes · 12 min read

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

Top Architect

Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU

0 likes · 14 min read

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

AI Cyberspace

Feb 5, 2025 · Fundamentals

From 2D Cards to AI Powerhouses: The Evolution of GPUs

This article traces the GPU's journey from early 2D graphics cards to modern GPGPUs powering AI and HPC, explains core hardware components, compares GPU and CPU architectures, and details the 3D rendering pipeline that underlies graphics and parallel computation.

GPUGraphics Processing UnitRendering Pipeline

0 likes · 10 min read

From 2D Cards to AI Powerhouses: The Evolution of GPUs

Architects' Tech Alliance

Feb 3, 2025 · Industry Insights

What Drives the AI Chip Race? GPUs, ASICs, and China's Emerging Players

The article examines the AI compute chip ecosystem—covering GPUs, FPGAs, and ASICs like VPU/TPU—highlights market share trends, key performance metrics such as TOPS, power and area, and provides a detailed overview of leading global and Chinese manufacturers and their flagship products.

AI chipsAI hardwareASIC

0 likes · 12 min read

What Drives the AI Chip Race? GPUs, ASICs, and China's Emerging Players

Code Mala Tang

Feb 2, 2025 · Artificial Intelligence

How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide

This guide walks you through the hardware and software prerequisites, Docker-based installation, environment configuration, model fine‑tuning, IDE integration, maintenance, and troubleshooting for running the DeepSeek AI programming assistant entirely on your own machine.

AI coding assistantDeepSeekDocker

0 likes · 12 min read

How to Deploy DeepSeek AI Coding Assistant Locally: A Step‑by‑Step Guide

Infra Learning Club

Jan 24, 2025 · Fundamentals

Inside NVCC: How CUDA Code Is Compiled and Linked

The article dissects NVCC’s compilation pipeline, showing how internal registration functions from host_runtime.h are injected into the host binary, how a simple CUDA demo is processed with --dryrun, and how the generated fatbin, PTX, and cubin files are linked and registered for GPU execution.

CUDACompilationFatBinary

0 likes · 10 min read

Inside NVCC: How CUDA Code Is Compiled and Linked

Architects' Tech Alliance

Jan 23, 2025 · Game Development

GPU Architecture and Rendering Pipeline Overview

This article provides a comprehensive overview of modern GPU architecture, covering components such as SMs, GPCs, memory hierarchy, unified shader architecture, SIMT execution, warp scheduling, and compares IMR, TBR, and TBDR rendering pipelines while offering practical optimization techniques for developers.

GPUGraphicsRendering

0 likes · 27 min read

GPU Architecture and Rendering Pipeline Overview

Infra Learning Club

Jan 15, 2025 · Fundamentals

Getting Started with GPU Kernel Virtualization: Building a Simple Linux Module

This tutorial walks through the motivation for Nvidia GPU kernel interception, explains Linux kernel module basics and privilege rings, shows how to set up an Ubuntu environment, write and compile a minimal LKM, load and test it, then create a fake GPU character device and mount it into a Docker container for verification.

CDockerGPU

0 likes · 8 min read

Getting Started with GPU Kernel Virtualization: Building a Simple Linux Module

Python Programming Learning Circle

Jan 15, 2025 · Fundamentals

Python Performance Optimization Tools and Libraries

This article introduces a comprehensive set of Python performance‑enhancing tools and libraries—including NumPy, SciPy, PyPy, Cython, Numba, GPU‑based solutions, and various wrappers—explaining how they accelerate code execution, reduce memory usage, and enable efficient single‑ and multi‑processor programming.

CompilationGPUJIT

0 likes · 8 min read

Python Performance Optimization Tools and Libraries

Architects' Tech Alliance

Jan 14, 2025 · Industry Insights

AI Server Market 2024: Growth Trends, Types, and Key Challenges

The 2024 AI server market is booming with global shipments surpassing 1.2 million units in 2023 and projected to reach 1.67 million in 2024, driven by rapid growth in China’s AI compute capacity, distinct training and inference server designs, and facing challenges in GPU quality, high‑speed interconnects, and cooling solutions.

2024AI hardwareAI servers

0 likes · 5 min read

AI Server Market 2024: Growth Trends, Types, and Key Challenges

Alibaba Cloud Infrastructure

Jan 14, 2025 · Cloud Native

Managing Distributed ECS Resources with ACK Edge and Kubernetes

This guide explains how to use Alibaba Cloud's ACK Edge to create a secure, high‑availability Kubernetes cluster that unifies management and scheduling of ECS instances across multiple VPCs, regions, and accounts, with detailed scenarios, advantages, step‑by‑step procedures, and sample YAML deployments.

ACK@EdgeDaemonSetDistributed Resources

0 likes · 8 min read

Managing Distributed ECS Resources with ACK Edge and Kubernetes

Architects' Tech Alliance

Jan 9, 2025 · Industry Insights

What Nvidia’s RTX 50 Series and Blackwell Architecture Mean for GPUs and Data Centers

The article details Nvidia’s upcoming RTX 50 consumer GPUs, the Blackwell‑based Grace NVLink72 data‑center super‑chip, and the pocket‑sized Project DIGITS AI system, highlighting specifications, performance claims, pricing expectations, and the broader impact on the GPU market.

BlackwellData centerGPU

0 likes · 6 min read

What Nvidia’s RTX 50 Series and Blackwell Architecture Mean for GPUs and Data Centers

Java Tech Enthusiast

Jan 9, 2025 · Cloud Native

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

This guide walks through installing the NVIDIA container toolkit, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in Kubernetes, labeling GPU nodes, and running a GPU‑accelerated FFmpeg pod to confirm successful GPU integration.

Container ToolkitDockerGPU

0 likes · 12 min read

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

Liangxu Linux

Jan 8, 2025 · Cloud Native

Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit

This guide walks through checking system and software environments, installing and configuring the NVIDIA Docker plugin, verifying GPU access in Docker containers, deploying the NVIDIA device plugin on a Kubernetes cluster, creating GPU‑enabled pods, and troubleshooting common issues, all with concrete commands and configuration examples.

Container ToolkitGPUKubernetes

0 likes · 12 min read

Enable NVIDIA GPU Access in Docker and Kubernetes with the NVIDIA Container Toolkit

21CTO

Jan 7, 2025 · Artificial Intelligence

Nvidia Reveals RTX 50 GPUs, Thor Auto Chip, and AI Supercomputer at CES 2025

At CES 2025, Nvidia CEO Jensen Huang announced the RTX 50 series GPUs built on the Blackwell architecture, the Thor automotive processor, the Project Digits personal AI supercomputer, new AI agents and robotics initiatives, detailing pricing, performance specs, and partnerships across automotive and AI ecosystems.

CES 2025GPUNvidia

0 likes · 10 min read

Nvidia Reveals RTX 50 GPUs, Thor Auto Chip, and AI Supercomputer at CES 2025

Architects' Tech Alliance

Jan 6, 2025 · Industry Insights

How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains

The article provides a detailed technical analysis of Nvidia’s new GB300 and B300 GPUs, comparing their performance, memory architecture, and power consumption to previous generations, and examines how these changes affect AI inference workloads, NVL72 accelerator systems, and the supply‑chain strategies of major cloud providers.

AI inferenceGPUNvidia

0 likes · 12 min read

How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains

Infra Learning Club

Jan 4, 2025 · Cloud Native

How GPU Devices Are Dynamically Mounted to Kubernetes Pods

This article dissects the GPUMounter project's implementation of dynamic GPU device mounting to a pod, detailing the roles of cgroups (v1 and v2) and Linux namespaces, and provides step‑by‑step command‑line examples and a CLI tool for practical use.

GPUKubernetesNamespace

0 likes · 13 min read

How GPU Devices Are Dynamically Mounted to Kubernetes Pods

Architects' Tech Alliance

Dec 29, 2024 · Industry Insights

Why Broadcom’s $1T Valuation Signals a New Era for AI ASICs

Broadcom’s market‑cap breakthrough past $1 trillion highlights its strategic push into AI ASICs, revealing how ASIC‑FPGA trade‑offs, collaborations with Google, and competition with Nvidia’s GPU ecosystem are reshaping the high‑performance computing landscape.

AI ASICBroadcomChip Design

0 likes · 13 min read

Why Broadcom’s $1T Valuation Signals a New Era for AI ASICs

DataFunSummit

Dec 28, 2024 · Artificial Intelligence

Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques

This talk presents the Ant Group team's recent work on large‑model inference memory optimization, covering GPU memory challenges, virtual memory management (VMM), the Virtual Tensor framework, LayerKV techniques, performance comparisons with Page Attention and FlashAttention, and extensive experimental results demonstrating reduced latency and higher QPS.

GPUVirtual Memoryattention

0 likes · 25 min read

Memory Optimization for Large Model Inference: Virtual Tensor and LayerKV Techniques

Architects' Tech Alliance

Dec 25, 2024 · Artificial Intelligence

Performance Analysis of NVIDIA H20 and L20 AI Inference Chips

This article evaluates NVIDIA's China‑specific H20 and L20 inference chips, comparing their compute and memory‑bandwidth characteristics against A100, H100 and H200, and shows how they achieve superior throughput in large‑model inference despite reduced specifications.

AIGPUH20

0 likes · 6 min read

Performance Analysis of NVIDIA H20 and L20 AI Inference Chips

Architects' Tech Alliance

Dec 6, 2024 · Industry Insights

How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases

This article explains the fundamentals of GPU architecture, the need for GPU virtualization, and walks through user‑level, kernel‑level, hardware‑level, and full GPU virtualization techniques, illustrating each layer with diagrams and code examples while highlighting practical deployment scenarios.

GPUHardware accelerationSystem Architecture

0 likes · 10 min read

How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases

DataFunSummit

Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationContinuous BatchingGPU

0 likes · 20 min read

Accelerating Large Language Model Inference with the YiNian LLM Framework

Architects' Tech Alliance

Nov 28, 2024 · Artificial Intelligence

Comprehensive Comparison of NVIDIA GPUs: A100, A800, H100, H200, H800, B100, B200, and L40S

This article provides an in‑depth overview of NVIDIA’s latest GPU families—including A100/A800, H100/H200/H800, B100/B200, and L40S—detailing their release backgrounds, key specifications, typical application scenarios, and pricing to help readers understand their performance and market positioning.

AIComparisonGPU

0 likes · 11 min read

Comprehensive Comparison of NVIDIA GPUs: A100, A800, H100, H200, H800, B100, B200, and L40S