Tagged articles
536 articles
Page 1 of 6
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

How China’s MUSA GPU Backend Earned Native Support in SGLang’s Mainline

The recent SGLang × MUSA meetup revealed that MUSA’s GPU backend has been merged into SGLang’s official codebase, delivering zero‑learning‑cost integration, performance gains of up to 66 % on DeepSeek‑V4, and a growing ecosystem of adapters, high‑performance kernels, and distributed inference support.

AI inferenceDeepSeekGPU
0 likes · 12 min read
How China’s MUSA GPU Backend Earned Native Support in SGLang’s Mainline
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2026 · Artificial Intelligence

Jensen Huang’s China Visit: Could It Revive GPU Prospects? Inside Nvidia’s DGX H200 Cluster Design

The article reviews the US‑approved export of Nvidia's DGX H200, the lack of deliveries, Jensen Huang’s surprise China trip that may speed approvals, and then provides a detailed technical breakdown of the DGX H200 cluster’s compute and storage networking, topology, optical link choices, and cable count estimates.

AI InfrastructureDGX H200Data Center Networking
0 likes · 8 min read
Jensen Huang’s China Visit: Could It Revive GPU Prospects? Inside Nvidia’s DGX H200 Cluster Design
Geek Labs
Geek Labs
May 13, 2026 · Artificial Intelligence

Two LLM Inference Acceleration Projects: A Mac‑Local Engine vs a Data‑Center Engine

This article compares two recent GitHub LLM inference engines—ds4.c, a Metal‑optimized engine for DeepSeek V4 Flash on Apple Silicon Macs, and TokenSpeed, a Python/C++‑based, data‑center‑grade engine for GPU clusters—detailing their design choices, performance numbers, usage instructions, and suitable scenarios.

DeepSeekGPUInference
0 likes · 8 min read
Two LLM Inference Acceleration Projects: A Mac‑Local Engine vs a Data‑Center Engine
21CTO
21CTO
May 11, 2026 · Artificial Intelligence

Mojo 1.0 Beta: A New Era of Python‑C++ Performance

Mojo 1.0 beta combines familiar Python syntax with C/Rust‑level speed, introduces API‑stabilizing language changes, expands cross‑vendor GPU support, and delivers measurable AI/ML performance gains, while offering a decision framework that weighs its early‑stage ecosystem against production needs.

AICGPU
0 likes · 10 min read
Mojo 1.0 Beta: A New Era of Python‑C++ Performance
Machine Heart
Machine Heart
May 10, 2026 · Artificial Intelligence

Why SRAM Is Key to Overcoming GPU Limits in Inference as Demand Soars

As large‑model inference demand outpaces training, the decode stage hits a memory‑wall that GPUs cannot efficiently cross; SRAM’s on‑chip bandwidth and low‑energy access open a path forward, though capacity and process limits still pose challenges.

AI hardwareCompute ArchitectureGPU
0 likes · 7 min read
Why SRAM Is Key to Overcoming GPU Limits in Inference as Demand Soars
SuanNi
SuanNi
May 7, 2026 · Industry Insights

Musk Gives 220k GPUs to Claude; Anthropic’s $1.2T Valuation Crowned AI King

Elon Musk redirected 220,000 GPUs to Anthropic’s Claude, fueling a dramatic 80‑fold Q1 usage surge and a $1.2 trillion valuation that now eclipses OpenAI, while the article dissects the compute‑capacity crunch, Colossus data‑center dynamics, and the broader AI market power shift.

AI valuationAnthropicClaude
0 likes · 8 min read
Musk Gives 220k GPUs to Claude; Anthropic’s $1.2T Valuation Crowned AI King
Old Zhang's AI Learning
Old Zhang's AI Learning
May 1, 2026 · Artificial Intelligence

NVIDIA’s Open‑Source Multimodal Nemotron 3 Nano Omni: Run Locally on Consumer GPUs (English‑Only)

NVIDIA’s Nemotron 3 Nano Omni 30B‑A3B‑Reasoning model, an open‑source multimodal LLM with 30 B parameters, 256K context and video‑audio‑image‑text capabilities, outperforms comparable models by up to 9.2× in video throughput, runs on consumer GPUs via 4‑bit GGUF quantization, but currently supports only English input.

GGUFGPUNemotron
0 likes · 17 min read
NVIDIA’s Open‑Source Multimodal Nemotron 3 Nano Omni: Run Locally on Consumer GPUs (English‑Only)
SuanNi
SuanNi
Apr 30, 2026 · Artificial Intelligence

Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud

This guide explains how to use Baidu's open‑source PaddleOCR engine—its full OCR and layout analysis pipeline, multi‑language support, and output formats—to set up a continuously running document recognition service on the 算网 GPU cloud platform, including environment preparation, model configuration, and inference execution.

Document ProcessingGPUMagicMind
0 likes · 6 min read
Deploy a 24/7 Document Recognition Toolbox with the PaddleOCR Image on the Cloud
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 24, 2026 · Artificial Intelligence

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

LoongForge is an open‑source, Megatron‑based multimodal training framework that unifies LLM, VLM, VLA and diffusion models, runs seamlessly on NVIDIA GPUs and Baidu Kunlun XPU, and delivers 15%‑45% end‑to‑end training acceleration while scaling linearly on thousands of cards.

GPUKunlun XPULoongForge
0 likes · 23 min read
LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup
DataFunTalk
DataFunTalk
Apr 19, 2026 · Industry Insights

Why Nvidia Still Rules AI Hardware: Inside Jensen Huang’s Strategic Interview

In a candid two‑hour podcast, Nvidia CEO Jensen Huang explains how the company’s focus on accelerated computing, a massive CUDA ecosystem, strategic supply‑chain partnerships and a philosophy of doing only what’s essential have built a durable moat that outpaces rivals like TPU, while also revealing why Nvidia prefers to empower cloud providers rather than become one itself.

AI hardwareGPUIndustry analysis
0 likes · 36 min read
Why Nvidia Still Rules AI Hardware: Inside Jensen Huang’s Strategic Interview
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 17, 2026 · Artificial Intelligence

Can Table Modeling Scale? Rethinking Tree Models in the Age of Massive Compute

The article examines how the dramatic increase in GPU compute power—illustrated by a single H100 GPU equaling about 200 Hadoop instances—challenges the dominance of tree‑based models for structured data, presents scaling‑law experiments with KMLP and FOUND, and argues that pre‑training can redefine the balance between compute, data, and algorithms.

FOUNDGPUKMLP
0 likes · 10 min read
Can Table Modeling Scale? Rethinking Tree Models in the Age of Massive Compute
Baidu Geek Talk
Baidu Geek Talk
Apr 13, 2026 · Artificial Intelligence

How Baidu’s 7th‑Gen AI Confidential VM Delivers Full‑Stack Secure Compute

Baidu Cloud’s 7th‑generation AI confidential virtual machine combines Intel TDX‑based CPU trusted execution, GPU confidential computing, and DPU‑offloaded I/O to provide end‑to‑end encrypted data paths, multi‑GPU scaling, and near‑native performance for high‑sensitivity AI workloads, redefining secure cloud AI infrastructure.

AIConfidential ComputingGPU
0 likes · 15 min read
How Baidu’s 7th‑Gen AI Confidential VM Delivers Full‑Stack Secure Compute
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 12, 2026 · Artificial Intelligence

Deploy the Open‑Source MiniMax‑M2.7 Model Locally: Step‑by‑Step Guide

MiniMax‑M2.7, the newly open‑sourced 230‑billion‑parameter MoE model, offers self‑evolution, professional software engineering and agent capabilities, and can be deployed locally using Ollama, vLLM, SGLang or Docker with 4‑8 H200 GPUs, while the article details hardware needs, performance gains and tool‑calling/Thinking features.

DeploymentGPULLM
0 likes · 11 min read
Deploy the Open‑Source MiniMax‑M2.7 Model Locally: Step‑by‑Step Guide
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 10, 2026 · Artificial Intelligence

How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU

The open‑source CoPaw‑Flash‑9B‑DataAnalyst‑LoRA model, fine‑tuned via LoRA, can autonomously load, explore, statistically analyze, visualize, and generate structured reports for CSV/Excel/JSON datasets, achieving a 90% success rate with an average of 26 iteration rounds, and it runs on a single consumer‑grade GPU using vLLM and the Data Analyst framework.

AgentData AnalystGPU
0 likes · 10 min read
How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 7, 2026 · Artificial Intelligence

vLLM 0.19.0: HuggingFace v5 Support, Multimodal Boosts, and CPU KV Cache Offload

The vLLM 0.19.0 release adds first‑day Gemma 4 support, merges zero‑bubble asynchronous scheduling with speculative decoding, matures Model Runner V2, introduces full‑CUDA‑graph acceleration for ViT, generalizes DBO, brings CPU KV cache offload, and expands hardware and Transformers compatibility, offering substantial performance and flexibility gains for production LLM inference.

CPU KV offloadGPUGemma 4
0 likes · 18 min read
vLLM 0.19.0: HuggingFace v5 Support, Multimodal Boosts, and CPU KV Cache Offload
AI Info Trend
AI Info Trend
Mar 24, 2026 · Industry Insights

NVIDIA’s DLSS 5 & CUDA Flywheel: Transforming AI in Gaming and Enterprise

The GTC 2026 keynote revealed NVIDIA’s latest DLSS 5 technology using 3‑D guided neural rendering to deliver cinematic‑quality graphics in real time, outlined a 20‑year CUDA ecosystem flywheel that fuels AI acceleration across structured and unstructured data, showcased enterprise case studies like Nestlé’s data‑refresh breakthrough, and highlighted a vast partner network, illustrating how AI is moving from experimental labs to everyday production.

AICUDADLSS
0 likes · 5 min read
NVIDIA’s DLSS 5 & CUDA Flywheel: Transforming AI in Gaming and Enterprise
Ops Community
Ops Community
Mar 13, 2026 · Backend Development

How to Diagnose and Fix Slow LLM Inference: A Full‑Stack Performance Guide

This article presents a comprehensive, step‑by‑step methodology for troubleshooting and optimizing large‑language‑model inference performance, covering GPU, CPU, memory, network, configuration, and application layers, with concrete benchmark scripts, diagnostic commands, and real‑world case studies.

CPUDebuggingGPU
0 likes · 48 min read
How to Diagnose and Fix Slow LLM Inference: A Full‑Stack Performance Guide
MaGe Linux Operations
MaGe Linux Operations
Mar 12, 2026 · Backend Development

How to Deploy vLLM Inference Service on Kubernetes with Ingress and Service Load Balancing

This guide walks through deploying a production‑grade vLLM inference service on Kubernetes, covering GPU resource scheduling, Service and Ingress configuration, session affinity, health checks, performance tuning, scaling, monitoring, fault‑tolerance, and best‑practice recommendations for high‑availability AI workloads.

GPUIngressKubernetes
0 likes · 47 min read
How to Deploy vLLM Inference Service on Kubernetes with Ingress and Service Load Balancing
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 7, 2026 · Artificial Intelligence

vLLM 0.17.0 Release: Full Qwen 3.5 Support and Anthropic API Compatibility

The vLLM 0.17.0 release brings FlashAttention 4 integration, a mature Model Runner V2, complete Qwen 3.5 series support, a one‑click performance‑mode flag, Anthropic API compatibility, advanced weight‑offloading, broader hardware support beyond NVIDIA, ASR model integration, and detailed upgrade and installation guidance.

ASRAnthropic APIFlashAttention
0 likes · 12 min read
vLLM 0.17.0 Release: Full Qwen 3.5 Support and Anthropic API Compatibility
SpringMeng
SpringMeng
Mar 2, 2026 · Backend Development

Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition

This article presents a complete design and implementation of a high‑throughput, asynchronous OCR pipeline built with Spring Boot and Tesseract, covering distributed architecture, thread‑pool tuning, image‑preprocessing, multi‑engine recognition, data extraction strategies, Kubernetes deployment, security compliance, chaos testing, and future AI‑driven enhancements.

AsynchronousGPUJava
0 likes · 10 min read
Deep Dive into an Asynchronous Spring Boot + Tesseract OCR Pipeline for Invoice Recognition
MaGe Linux Operations
MaGe Linux Operations
Feb 27, 2026 · Artificial Intelligence

How to Deploy Scalable LLM Inference with vLLM on Kubernetes and GPU Scheduling

This guide explains how to deploy vLLM for large‑language‑model serving on Kubernetes, covering GPU resource management, tensor‑parallel configuration, continuous batching, quantization choices, autoscaling with HPA and KEDA, multi‑model routing, and best‑practice recommendations for performance, cost control, and high availability.

GPUKubernetesLLM inference
0 likes · 48 min read
How to Deploy Scalable LLM Inference with vLLM on Kubernetes and GPU Scheduling
Data STUDIO
Data STUDIO
Feb 21, 2026 · Big Data

Boost Python Performance Up to 50× Without Changing Your Code

Python’s reputation for slowness can be overcome by selecting the right tools—Numba, PyPy, CuPy, JAX, Ray, Joblib, async I/O, memory profilers, and big‑data frameworks—delivering speedups from 6× to over 50× with minimal or no code modifications.

AsyncGPUProfiling
0 likes · 22 min read
Boost Python Performance Up to 50× Without Changing Your Code
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 21, 2026 · Artificial Intelligence

Why Fine‑Tuning Large Models Is Now Ridiculously Easy

The article explains how Unsloth dramatically lowers the barrier to fine‑tuning large language models, offering one‑click installation, free Colab GPU support, extensive model coverage, impressive speed and memory gains, and detailed step‑by‑step guides that let anyone with basic Python skills train powerful models.

ColabGPULoRA
0 likes · 14 min read
Why Fine‑Tuning Large Models Is Now Ridiculously Easy
dbaplus Community
dbaplus Community
Feb 9, 2026 · Artificial Intelligence

How EffectiveGPU Cuts GPU Costs with Fine‑Grained Partitioning and Volcano Scheduling

This article details how SF Tech's EffectiveGPU (EGPU) platform redesigns GPU resource management on Kubernetes, introducing fine‑grained memory and compute partitioning, priority‑based scheduling, Volcano integration, and monitoring pipelines to dramatically improve utilization and reduce hardware costs for AI workloads.

AI PlatformGPUGPU partitioning
0 likes · 23 min read
How EffectiveGPU Cuts GPU Costs with Fine‑Grained Partitioning and Volcano Scheduling
AI Waka
AI Waka
Feb 1, 2026 · Artificial Intelligence

Boost LLM Inference Speed: Precision Tricks, Quantization, and Multi‑GPU Strategies

This article reviews practical techniques for accelerating large language model inference—including reduced‑precision formats, post‑training quantization, adapter‑based fine‑tuning, pruning, continuous batch processing, and multi‑GPU deployment—while providing concrete code examples, benchmark results, and guidance on selecting the right approach for production workloads.

GPUInferenceLLM
0 likes · 20 min read
Boost LLM Inference Speed: Precision Tricks, Quantization, and Multi‑GPU Strategies
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 28, 2026 · Artificial Intelligence

How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough

The article details a step‑by‑step local deployment of DeepSeek‑OCR‑2, covering GPU memory requirements, accuracy on complex tables, long inference times, dependency hurdles like GCC, GLIBC and flash‑attn, and provides concrete solutions using conda environments and symlinks.

CondaDeepSeek-OCR 2Deployment
0 likes · 7 min read
How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough
21CTO
21CTO
Jan 26, 2026 · Artificial Intelligence

What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements

PyTorch 2.10 introduces extensive upgrades for AMD ROCm, Intel XPU, and NVIDIA CUDA, adds new Torch XPU APIs, expands Python 3.14 support, and brings performance‑focused improvements such as fused kernels and enhanced quantization, all available via the official GitHub release.

CUDADeep LearningGPU
0 likes · 4 min read
What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements
MaGe Linux Operations
MaGe Linux Operations
Jan 18, 2026 · Artificial Intelligence

How to Deploy Scalable LLM Inference on Kubernetes with GPU Autoscaling

This guide walks through building a production‑grade Kubernetes GPU cluster for large language model inference, covering hardware sizing, GPU resource scheduling, model storage options, automated scaling with HPA, health checks, monitoring, troubleshooting, and multi‑model deployment strategies.

DockerGPUInference
0 likes · 49 min read
How to Deploy Scalable LLM Inference on Kubernetes with GPU Autoscaling
Architects' Tech Alliance
Architects' Tech Alliance
Jan 16, 2026 · Artificial Intelligence

Why Do GPUs and NPUs Produce Different FP16 Results? Uncovering AI Chip Precision Secrets

Engineers training large AI models often see noticeable FP16/BF16 result differences between GPUs and NPUs, and even between generations of the same chip, due to floating‑point representation limits, hardware design choices, software library implementations, compiler optimizations, and parallel execution nondeterminism.

AIGPUNPU
0 likes · 10 min read
Why Do GPUs and NPUs Produce Different FP16 Results? Uncovering AI Chip Precision Secrets
Architects' Tech Alliance
Architects' Tech Alliance
Jan 1, 2026 · Artificial Intelligence

Why Nvidia’s Blackwell B200 Could Redefine AI GPU Performance

The article provides an in‑depth technical analysis of Nvidia’s Blackwell B200 GPU, detailing its multi‑chip architecture, cache hierarchy, memory bandwidth, atomic operation latency, compute throughput, and tensor memory features, and compares these metrics against Nvidia H100, A100 and AMD MI300X to assess its suitability for AI workloads.

AIAMDBenchmark
0 likes · 19 min read
Why Nvidia’s Blackwell B200 Could Redefine AI GPU Performance
Architects' Tech Alliance
Architects' Tech Alliance
Dec 31, 2025 · Artificial Intelligence

Why Google’s TPUv7 Is Outsmarting Nvidia GPUs: From Performance to System Efficiency

The article examines the shifting AI‑chip landscape, explaining how Google’s TPUv7, backed by massive pod architecture and optical circuit switching, challenges Nvidia’s GPU dominance by offering superior system‑level efficiency and lower total cost of ownership for large‑scale model training.

AI hardwareGPULarge-scale AI training
0 likes · 12 min read
Why Google’s TPUv7 Is Outsmarting Nvidia GPUs: From Performance to System Efficiency
MaGe Linux Operations
MaGe Linux Operations
Dec 27, 2025 · Artificial Intelligence

How to Deploy and Optimize Enterprise‑Scale LLM Inference Services: A Practical Guide

This guide walks you through deploying large language models such as ChatGLM and Llama in production, covering environment setup, model quantization, dynamic batching, service configuration, Nginx load balancing, monitoring, troubleshooting, and best‑practice recommendations for high‑performance, cost‑effective AI inference.

GPUInferenceLLM
0 likes · 48 min read
How to Deploy and Optimize Enterprise‑Scale LLM Inference Services: A Practical Guide
MaGe Linux Operations
MaGe Linux Operations
Dec 26, 2025 · Operations

Taming vLLM OOM: Real‑World Causes and Proven Fixes for Production

This article examines why vLLM experiences out‑of‑memory errors in production, explains memory fragmentation caused by PagedAttention, outlines four typical OOM scenarios with concrete command‑line solutions, and provides deep analysis, configuration scripts, dynamic tuning, troubleshooting flowcharts, monitoring alerts, and best‑practice recommendations.

DeploymentGPUMemory Fragmentation
0 likes · 24 min read
Taming vLLM OOM: Real‑World Causes and Proven Fixes for Production
MaGe Linux Operations
MaGe Linux Operations
Dec 19, 2025 · Artificial Intelligence

Boost vLLM Inference Throughput by 40% with Three Simple Config Tweaks

After discovering that only a few vLLM settings truly impact performance, this guide details how adjusting gpu_memory_utilization, max_num_batched_tokens, and enabling chunked prefill can raise Qwen2.5‑72B‑Instruct throughput from ~1800 to over 2500 tokens/s, improve latency, and provides comprehensive deployment, monitoring, and troubleshooting instructions.

DockerGPUInference Optimization
0 likes · 30 min read
Boost vLLM Inference Throughput by 40% with Three Simple Config Tweaks
Raymond Ops
Raymond Ops
Dec 16, 2025 · Artificial Intelligence

Master Multi‑GPU Load Balancing for OLLAMA: From Setup to Production

This guide walks you through configuring OLLAMA for multi‑GPU load balancing, covering hardware checks, CUDA and Docker setup, native and containerized deployment methods, core parameter tuning, advanced sharding, dynamic monitoring, troubleshooting, production best practices, and a real‑world RTX 4090 case study.

AI inferenceCUDAGPU
0 likes · 15 min read
Master Multi‑GPU Load Balancing for OLLAMA: From Setup to Production
Data STUDIO
Data STUDIO
Dec 9, 2025 · Artificial Intelligence

20 Core PyTorch Concepts to Accelerate Your AI Projects

This article walks through twenty essential PyTorch concepts—from basic Tensor creation and manipulation, through autograd and neural‑network construction, to data loading, GPU acceleration, model saving, and practical training tricks—providing concrete code examples and clear explanations for developers eager to build and deploy AI models.

AutogradDataLoaderDeep Learning
0 likes · 16 min read
20 Core PyTorch Concepts to Accelerate Your AI Projects
Sohu Tech Products
Sohu Tech Products
Dec 3, 2025 · Frontend Development

Recreating Stunning Strange Attractor, Fibonacci Sphere & Galaxy Animations in Flutter with Pure Dart

This article explains how to implement three complex visual effects—Strange Attractor, Fibonacci Sphere, and Galaxy animations—in Flutter using only Dart code, covering the underlying differential equations, Euler integration, 3D‑to‑2D projection, rotation, perspective, performance optimizations, and solutions to common GPU tile‑artifact issues.

DARTFlutterGPU
0 likes · 16 min read
Recreating Stunning Strange Attractor, Fibonacci Sphere & Galaxy Animations in Flutter with Pure Dart
AntTech
AntTech
Nov 27, 2025 · Artificial Intelligence

How AMem NCCL‑Plugin Cuts GPU Memory Overhead for Trillion‑Parameter RL Models

The article explains the design, implementation, and performance of the AMem NCCL‑Plugin, a lightweight extension to NVIDIA's NCCL that enables transparent offloading and rapid recovery of GPU memory during reinforcement‑learning training of trillion‑parameter models, detailing its architecture, APIs, benchmarks, installation steps, and integration guidelines.

ASystemDistributed TrainingGPU
0 likes · 18 min read
How AMem NCCL‑Plugin Cuts GPU Memory Overhead for Trillion‑Parameter RL Models
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Nov 24, 2025 · Artificial Intelligence

Simplifying AI Operator Development with TileLang DSL

TileLang is a Python‑style DSL built on TVM that separates algorithm logic from hardware scheduling, offers beginner to expert interfaces, supports multiple GPU and CPU backends, and delivers performance on par with or better than existing AI kernels, as demonstrated with GEMM, FlashAttention and other benchmarks.

AI operatorsDSLGEMM
0 likes · 10 min read
Simplifying AI Operator Development with TileLang DSL
Deepin Linux
Deepin Linux
Nov 10, 2025 · Fundamentals

How the Linux DRM GPU Driver Framework Powers Modern Graphics

An in‑depth look at Linux’s DRM GPU driver framework reveals how Direct Rendering Manager, libdrm, KMS, GEM and related components collaborate to manage GPU resources, render graphics, and support multi‑display setups, complete with illustrative code examples and practical debugging tips.

DRMGPUGraphics
0 likes · 47 min read
How the Linux DRM GPU Driver Framework Powers Modern Graphics
IT Services Circle
IT Services Circle
Nov 9, 2025 · Fundamentals

Why Nvidia’s GPUs Are the Secret Key to the Quantum Computing Era

Nvidia leverages its GPUs to solve quantum computers' fragile error‑correction problem, introducing ultra‑fast NVQLink interconnect and the CUDA‑Q programming platform, creating a feedback loop that secures its dominance in both traditional and emerging quantum markets.

CUDA-QGPUNVQLink
0 likes · 6 min read
Why Nvidia’s GPUs Are the Secret Key to the Quantum Computing Era
Linux Kernel Journey
Linux Kernel Journey
Nov 4, 2025 · Operations

How to Use Kernel Tracepoints for Zero‑Overhead GPU Driver Monitoring

This tutorial explains how to leverage Linux kernel tracepoints with eBPF and bpftrace to capture real‑time GPU driver activity—including job scheduling, memory management, and command submission—across Intel, AMD, Nouveau, and NVIDIA GPUs, providing detailed examples, scripts, and analysis of the resulting data.

DRMGPUPerformance Monitoring
0 likes · 20 min read
How to Use Kernel Tracepoints for Zero‑Overhead GPU Driver Monitoring
Open Source Linux
Open Source Linux
Nov 4, 2025 · Artificial Intelligence

Why NVIDIA Left China and How Domestic AI Chips Are Rising to Lead

After NVIDIA’s abrupt exit from the Chinese market, domestic AI chip makers such as Huawei Ascend, Cambricon, Moores Thread, and Muxi are rapidly filling the gap, with increasing market share, diverse architectures, and ambitious production goals that could soon surpass foreign competitors.

AI chipsChina MarketDomestic semiconductor
0 likes · 6 min read
Why NVIDIA Left China and How Domestic AI Chips Are Rising to Lead
DataFunTalk
DataFunTalk
Oct 30, 2025 · Artificial Intelligence

Why Nvidia’s $5 Trillion Valuation Marks a New Era for AI Infrastructure

Nvidia just became the first company to break the $5 trillion market‑cap threshold, a milestone that underscores its rapid growth, ambitious AI‑factory vision, 6G edge‑AI plans, autonomous‑driving initiatives, digital‑twin manufacturing, and the strategic importance of its CUDA ecosystem.

AIGPUMarket Cap
0 likes · 8 min read
Why Nvidia’s $5 Trillion Valuation Marks a New Era for AI Infrastructure
Linux Kernel Journey
Linux Kernel Journey
Oct 21, 2025 · Industry Insights

Bridging the GPU Observability Gap: Why eBPF on GPUs Matters

The article explains how bpftime extends eBPF to NVIDIA and AMD GPUs, exposing fine‑grained execution details that traditional CPU‑side tools miss, and demonstrates a unified, programmable observability stack that overcomes the limitations of existing GPU profilers in both synchronous and asynchronous workloads.

CUDAGPUObservability
0 likes · 23 min read
Bridging the GPU Observability Gap: Why eBPF on GPUs Matters
Programmer DD
Programmer DD
Oct 13, 2025 · Artificial Intelligence

Running ONNX AI Inference Natively in Java Without Python

This article explains how enterprise architects can integrate ONNX‑based machine‑learning inference directly into Java applications, covering tokenizer integration, GPU acceleration, deployment patterns, and lifecycle management to achieve secure, scalable, and observable AI services without relying on Python runtimes.

AI inferenceGPUJava
0 likes · 16 min read
Running ONNX AI Inference Natively in Java Without Python
BirdNest Tech Talk
BirdNest Tech Talk
Oct 12, 2025 · Artificial Intelligence

What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?

The article uses a whimsical journey to illustrate how token data is dispatched across GPU clusters—detailing functions like get_dispatch_layout, notify_dispatch, and combine_token, showing RDMA and NVLink pathways, performance experiments, and the final verification of token integrity.

AIDistributed SystemsGPU
0 likes · 5 min read
What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?
Programmer DD
Programmer DD
Oct 12, 2025 · Backend Development

Boost Java Performance: Integrate CUDA GPU Acceleration via JNI

This guide explains why Java struggles with high‑performance or data‑intensive workloads, introduces GPU acceleration with CUDA, compares integration options such as JCuda, JNI, and JNA, walks through a practical encryption use case with performance benchmarks, and provides production‑grade best practices for memory, threading, testing, security, and deployment.

CUDAGPUHigh‑performance computing
0 likes · 23 min read
Boost Java Performance: Integrate CUDA GPU Acceleration via JNI
DataFunTalk
DataFunTalk
Oct 10, 2025 · Artificial Intelligence

Is Oracle’s AI Cloud a Hidden Money‑Sink? Uncovering the Real Profit Margins

An in‑depth analysis reveals that Oracle’s AI‑focused cloud business, built on expensive Nvidia GPU rentals for OpenAI and other AI developers, generates massive revenue but suffers from alarmingly low profit margins, creating a systemic risk that could ripple through the entire AI infrastructure ecosystem.

AI cloudGPUOpenAI
0 likes · 14 min read
Is Oracle’s AI Cloud a Hidden Money‑Sink? Uncovering the Real Profit Margins
21CTO
21CTO
Oct 7, 2025 · Artificial Intelligence

Why Microsoft Is Shifting AI Workloads from GPUs to Its Own Maia Accelerators

Microsoft, after buying massive GPU inventories from Nvidia and AMD, is accelerating its move to custom AI accelerators like Maia to improve cost‑performance in its data centers, even though its first‑generation chips still lag behind industry leaders.

AI acceleratorGPUMaia
0 likes · 5 min read
Why Microsoft Is Shifting AI Workloads from GPUs to Its Own Maia Accelerators
Java Tech Enthusiast
Java Tech Enthusiast
Oct 6, 2025 · Artificial Intelligence

How China’s New GPU Startup Moore Thread Is Accelerating the AI Race

Amid US export restrictions, China’s five‑year‑old GPU pioneer Moore Thread is racing to fill the high‑end GPU gap, detailing the technology’s role in AI, its ecosystem strategy, and the significance of its fast‑track IPO for the domestic semiconductor and AI compute landscape.

AI computingChinaGPU
0 likes · 10 min read
How China’s New GPU Startup Moore Thread Is Accelerating the AI Race
Fighter's World
Fighter's World
Oct 3, 2025 · Industry Insights

What Jensen Huang Revealed About Nvidia’s Bold “Sun Strategy” in the BG2 Interview

The article dissects Jensen Huang’s BG2 interview to explain Nvidia’s shift from a pure GPU supplier to an AI‑Factory architect, detailing the double‑exponential AI demand growth, token‑based economics, technical and ecosystem moats, sovereign AI initiatives, open‑link strategies, and the long‑term vision of physical AI.

AI FactoryAI MarketGPU
0 likes · 27 min read
What Jensen Huang Revealed About Nvidia’s Bold “Sun Strategy” in the BG2 Interview
Data Party THU
Data Party THU
Sep 30, 2025 · Backend Development

Ray Serve vs Celery: Which Is Best for GPU‑Intensive Parallel Workloads?

This article compares Ray Serve and Celery, explaining their design philosophies, scaling models, GPU‑aware scheduling, operational trade‑offs, and real‑world case studies to help engineers choose the right tool for high‑throughput online inference or large‑scale batch processing.

Distributed SystemsGPUModel Serving
0 likes · 9 min read
Ray Serve vs Celery: Which Is Best for GPU‑Intensive Parallel Workloads?
AI Cyberspace
AI Cyberspace
Sep 28, 2025 · Artificial Intelligence

How to Set Up WSL2 GPU Acceleration and Profile CUDA on Windows 11

This guide walks through configuring Windows 11 with WSL2 and Ubuntu 22.04 for GPU‑accelerated CUDA development, installing NVIDIA drivers and CUDA libraries, setting up SSH and firewall rules, running a CUDA stress‑test program, and using Nsight Systems, Nsight Compute, and NVIDIA DCGM for performance profiling and monitoring.

CUDAGPULinux
0 likes · 39 min read
How to Set Up WSL2 GPU Acceleration and Profile CUDA on Windows 11
Raymond Ops
Raymond Ops
Sep 17, 2025 · Cloud Native

Enable GPU Acceleration in Docker and Kubernetes with NVIDIA Toolkit

This guide walks through checking the system environment, installing the NVIDIA Docker plugin, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in a Kubernetes cluster, creating a GPU‑enabled pod, and testing GPU‑accelerated video processing with FFmpeg.

Container ToolkitDockerGPU
0 likes · 12 min read
Enable GPU Acceleration in Docker and Kubernetes with NVIDIA Toolkit
Refining Core Development Skills
Refining Core Development Skills
Sep 11, 2025 · Fundamentals

How Kepler Boosted GPU Performance: Architecture, Specs, and Compute Power

This article examines NVIDIA's Kepler GPU architecture, highlighting its 28 nm process, increased transistor count, expanded CUDA core count, PCIe 3.0 support, enhanced memory hierarchy, new compute units, scheduling improvements like Hyper‑Q, and performance metrics of the Tesla K20X, illustrating the substantial gains over previous generations.

CUDAComputeGPU
0 likes · 13 min read
How Kepler Boosted GPU Performance: Architecture, Specs, and Compute Power
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 8, 2025 · Fundamentals

How to Profile GPU Kernels with PTX Probes: From CUDA Basics to Custom Instrumentation

This article walks through GPU performance analysis, starting with CUDA architecture fundamentals, demonstrating matrix multiplication optimization, explaining PTX assembly, and introducing the Neutrino framework for programmable GPU probes that enable fine‑grained, custom instrumentation and detailed timing measurements of kernel execution.

CUDAGPUNeutrino
0 likes · 45 min read
How to Profile GPU Kernels with PTX Probes: From CUDA Basics to Custom Instrumentation
Architects' Tech Alliance
Architects' Tech Alliance
Sep 5, 2025 · Artificial Intelligence

Why Nvidia’s Custom B30A Chip Commands $24,000 in China

A Reuters report reveals that Nvidia’s China‑specific B30A GPU may cost up to $24,000, double the price of the current H20 chip, while Chinese AI developers continue to favor Nvidia despite government pushes for domestic alternatives and shifting US export policies.

AI chipsB30AChina
0 likes · 4 min read
Why Nvidia’s Custom B30A Chip Commands $24,000 in China
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2025 · Artificial Intelligence

Why the Last Decade Became the Golden Age of AI Chip Architecture

The article traces the evolution of AI hardware over the past ten years, outlining three key phases—from early chip limitations that sidelined neural networks, through CPU advances that still fell short, to the rise of GPUs and specialized AI chips that finally unlocked rapid AI deployment, while also highlighting the parallel impact of algorithmic breakthroughs and massive data growth.

AI hardwareBig DataGPU
0 likes · 5 min read
Why the Last Decade Became the Golden Age of AI Chip Architecture
Ops Development Stories
Ops Development Stories
Aug 29, 2025 · Cloud Native

How to Build a GPU Spot‑Pool Operator on Kubernetes with Kubebuilder

This guide walks through creating a Kubernetes Operator using Kubebuilder to manage a GPU spot‑pool on Tencent Cloud, covering CRD design, controller logic, code generation, and deployment steps, enabling automated scaling of GPU resources for AI workloads while illustrating core Cloud‑Native concepts.

GPUKubebuilderKubernetes
0 likes · 19 min read
How to Build a GPU Spot‑Pool Operator on Kubernetes with Kubebuilder
Alibaba Cloud Native
Alibaba Cloud Native
Aug 21, 2025 · Cloud Native

How Higress AI Gateway Optimizes LLM Load Balancing with Global, Prefix, and GPU‑Aware Algorithms

This article explains why traditional load‑balancing methods fall short for large language model services and introduces Higress AI Gateway's three specialized algorithms—global minimum‑request, prefix‑matching, and GPU‑aware load balancing—detailing their design, Redis‑based implementation, deployment steps, and performance gains.

GPULLMload balancing
0 likes · 11 min read
How Higress AI Gateway Optimizes LLM Load Balancing with Global, Prefix, and GPU‑Aware Algorithms
AI Cyberspace
AI Cyberspace
Aug 4, 2025 · Artificial Intelligence

From Tesla to Hopper: How NVIDIA GPU Architectures Powered the AI Revolution

This article traces the evolution of NVIDIA GPU architectures—from the early Tesla series through Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere, Hopper, and up to the upcoming Blackwell—explaining their hardware innovations, CUDA programming model, and how each generation enabled breakthroughs in high‑performance computing, deep learning, and AI applications.

AICUDAGPU
0 likes · 67 min read
From Tesla to Hopper: How NVIDIA GPU Architectures Powered the AI Revolution
Architecture Development Notes
Architecture Development Notes
Jul 21, 2025 · Artificial Intelligence

Why Rust’s Burn Framework Is Redefining Deep Learning Performance

Burn, a native Rust deep learning framework by Tracel AI, combines extreme flexibility, high computational efficiency, and cross‑platform portability through a modular backend abstraction, type‑safe tensor operations, asynchronous execution, and extensive tooling, offering performance‑competitive alternatives to Python‑based frameworks for both training and inference.

BurnDeep LearningGPU
0 likes · 23 min read
Why Rust’s Burn Framework Is Redefining Deep Learning Performance
Open Source Linux
Open Source Linux
Jul 16, 2025 · Artificial Intelligence

How Huawei’s New AI Chip Aims to Rival Nvidia and AMD GPUs

Huawei is developing a new AI‑focused GPU‑style chip that mirrors Nvidia and AMD architectures, aiming to ease Chinese developers’ shift from Nvidia hardware, but still faces software compatibility hurdles due to reliance on CUDA and ongoing U.S. export restrictions.

AI ChipCUDAChip Design
0 likes · 3 min read
How Huawei’s New AI Chip Aims to Rival Nvidia and AMD GPUs
Architects' Tech Alliance
Architects' Tech Alliance
Jul 13, 2025 · Artificial Intelligence

How Huawei’s New AI Chip Aims to Rival Nvidia’s GPUs

Huawei is developing a new AI chip that functions more like a general‑purpose GPU, aiming to match Nvidia and AMD architectures and simplify the transition for Chinese AI developers, while still facing challenges such as adapting CUDA‑based software and overcoming export restrictions.

AI ChipCUDAGPU
0 likes · 3 min read
How Huawei’s New AI Chip Aims to Rival Nvidia’s GPUs
Tencent Technical Engineering
Tencent Technical Engineering
Jul 8, 2025 · Artificial Intelligence

Why GPUs Power Large‑Model Inference: From Graphics to GPGPU

This article explains how modern GPUs evolved from graphics rendering to general‑purpose computing, details the CPU‑GPU heterogenous architecture, walks through a CUDA demo that adds two billion‑element arrays, compares CPU and GPU performance, and describes the compilation, loading, and execution pipeline of CUDA kernels.

AI inferenceCUDAGPGPU
0 likes · 33 min read
Why GPUs Power Large‑Model Inference: From Graphics to GPGPU
Tencent Cloud Developer
Tencent Cloud Developer
Jul 8, 2025 · Artificial Intelligence

How GPUs Power AI: From Graphics to GPGPU Explained

This article explores how GPUs evolved from graphics accelerators to general‑purpose processors for AI, detailing the CPU‑GPU heterogeneous architecture, the CUDA programming workflow, compilation into fat binaries, kernel launch mechanics, hardware components, and the differences between SIMD and SIMT models, with performance comparisons and code examples.

AICUDAGPGPU
0 likes · 31 min read
How GPUs Power AI: From Graphics to GPGPU Explained
JavaEdge
JavaEdge
Jun 28, 2025 · Backend Development

How Java Developers Can Harness CUDA on NVIDIA A100 GPUs

This guide explains why Java architects should understand CUDA, describes the GPU programming model, compares CPU and GPU designs, and details three practical ways—JNI, JCuda, and TornadoVM—to integrate CUDA acceleration into Java applications, with tips for using A100 GPUs effectively.

A100CUDAGPU
0 likes · 15 min read
How Java Developers Can Harness CUDA on NVIDIA A100 GPUs