Tagged articles

536 articles

Page 4 of 6

Dec 27, 2023 · Cloud Computing

One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy

This guide explains how to quickly select a ModelScope open‑source LLM, deploy it to Alibaba Cloud Function Compute using the SwingDeploy one‑click feature, enable reserved idle billing, and evaluate the cost savings compared with traditional GPU provisioning.

Cost OptimizationFunction ComputeGPU

0 likes · 11 min read

One‑Click Deployment of LLMs to Alibaba Cloud Function Compute with SwingDeploy

Java Architecture Diary

Dec 22, 2023 · Artificial Intelligence

Boosting LLM Inference on Java: Vector API, Project Panama & TornadoVM Performance

This article evaluates the performance of large language model (LLM) inference on the Java platform, examining llama2.java implementations that leverage Java Streams, the Vector API, Project Panama, and TornadoVM GPU acceleration, and compares them against native C versions across various model sizes.

GPUJavaProject Panama

0 likes · 13 min read

Boosting LLM Inference on Java: Vector API, Project Panama & TornadoVM Performance

Architects' Tech Alliance

Dec 22, 2023 · Artificial Intelligence

AI Server Architecture, Market Trends, and Competitive Landscape in 2023

An in‑depth overview of AI server components, market growth, AIGC‑driven demand, heterogeneous computing architectures, major vendors, and future trends, highlighting hardware composition, cost breakdown, competitive rankings, and the impact of GPU, CPU, and emerging AI accelerators on the industry.

AI serversCPUGPU

0 likes · 14 min read

AI Server Architecture, Market Trends, and Competitive Landscape in 2023

Python Programming Learning Circle

Dec 21, 2023 · Artificial Intelligence

Introducing Streamlit: A Free Open‑Source Framework for Building Machine‑Learning Apps with Python

Streamlit is a free, open‑source Python framework that lets machine‑learning engineers quickly turn scripts into interactive web apps, featuring top‑to‑bottom script execution, widget‑as‑variable handling, caching, GPU support, and seamless integration with tools like Git.

App DevelopmentGPUPython

0 likes · 9 min read

Introducing Streamlit: A Free Open‑Source Framework for Building Machine‑Learning Apps with Python

Architects' Tech Alliance

Dec 3, 2023 · Artificial Intelligence

Overview of the AI Chip Market: Architectures, Companies, and Performance Comparisons

The rapidly growing multi‑billion‑dollar AI chip market in 2023 is categorized by architecture (GPGPU, FPGA, ASIC, compute‑in‑memory) and deployment location (cloud, edge, terminal), with Chinese vendors advancing training and inference chips but still lagging behind leading Nvidia products in performance and bandwidth.

AI chipsASICChina AI

0 likes · 8 min read

Overview of the AI Chip Market: Architectures, Companies, and Performance Comparisons

OPPO Kernel Craftsman

Nov 27, 2023 · Mobile Development

Understanding Android HWUI, Skia, and OpenGL Rendering Pipeline

The article explains Android’s graphics pipeline by detailing how HWUI and Skia translate view operations into OpenGL ES commands, describing RenderThread stages such as synchronization, dirty‑region calculation, buffer handling, and drawing, and comparing mobile GPU architectures like TBR, TBDR, and IMR.

AndroidGPUOpenGL

0 likes · 13 min read

Understanding Android HWUI, Skia, and OpenGL Rendering Pipeline

Architects' Tech Alliance

Nov 19, 2023 · Artificial Intelligence

NVIDIA H100 vs L40S: AI‑Focused GPU Comparison and Practical Alternatives

This article compares NVIDIA's high‑end AI GPUs—H100, A100, and the newer L40S—detailing their specifications, performance trade‑offs, pricing, availability, and suitability for training and inference workloads, while highlighting why L40S can be a cost‑effective alternative for many enterprises.

AIGPUH100

0 likes · 10 min read

NVIDIA H100 vs L40S: AI‑Focused GPU Comparison and Practical Alternatives

Open Source Linux

Nov 15, 2023 · Fundamentals

How to Use the 2023 GPU Ladder to Choose the Right Desktop Graphics Card

This guide explains the 2023 desktop GPU ladder, clarifies the differences between NVIDIA (N‑cards) and AMD (A‑cards), distinguishes public and non‑public models, and recommends specific graphics cards for each performance tier from entry‑level to high‑end gaming.

AMDGPUGraphics Card

0 likes · 6 min read

How to Use the 2023 GPU Ladder to Choose the Right Desktop Graphics Card

Architects' Tech Alliance

Oct 24, 2023 · Fundamentals

Understanding CPU, GPU, and Graphics Card Performance Ladder Charts (2023)

This article explains how CPU core count, clock speed, and cache, as well as GPU shader count, frequency, memory bandwidth, and graphics‑card memory size and interface affect performance, and shows 2023 desktop and laptop performance ladder charts to help users compare and choose the right hardware.

CPUGPUGraphics Card

0 likes · 5 min read

Understanding CPU, GPU, and Graphics Card Performance Ladder Charts (2023)

Architects' Tech Alliance

Oct 15, 2023 · Fundamentals

2023 GPU Graphics Card Industry Report: Market Overview, Mining Impact, and Future Trends

The 2023 GPU graphics‑card report provides a comprehensive overview of China's graphics‑card industry, detailing current development, OEM product lines, the evolution of mining, post‑Ethereum‑merge market shifts, full‑chain analysis, and forecasts for downstream demand across gaming, AI, data‑center, and automotive applications.

AIChinaGPU

0 likes · 15 min read

2023 GPU Graphics Card Industry Report: Market Overview, Mining Impact, and Future Trends

Architects' Tech Alliance

Oct 15, 2023 · Fundamentals

What You Need to Know About Server CPUs, GPUs, and Memory

This article provides a concise technical overview of server hardware, covering CPU architecture and platform options, GPU evolution and key specifications, and DDR4 memory compatibility rules, helping readers understand the essential components for building or upgrading a server.

CPUGPUMemory

0 likes · 6 min read

What You Need to Know About Server CPUs, GPUs, and Memory

php Courses

Oct 10, 2023 · Artificial Intelligence

Microsoft to Unveil Its Own AI Chip "Athena" at Ignite Conference

Microsoft plans to announce its self‑developed AI processor, codenamed Athena, at the Ignite developer conference in mid‑November, aiming to reduce reliance on Nvidia GPUs and strengthen its AI services such as Azure AI, Bing Chat, and Copilot.

AI ChipGPUHardware

0 likes · 2 min read

Microsoft to Unveil Its Own AI Chip "Athena" at Ignite Conference

php Courses

Oct 7, 2023 · Artificial Intelligence

Microsoft to Unveil Its First AI‑Focused Chip ‘Athena’ at Ignite Conference

Microsoft plans to launch its first AI‑specific chip, codenamed Athena, at the upcoming Ignite conference, aiming to compete with Nvidia's H100 GPU, reduce reliance on Nvidia, and provide Azure and other cloud customers with a home‑grown AI accelerator for data‑center workloads.

AI ChipAzureData center

0 likes · 6 min read

Microsoft to Unveil Its First AI‑Focused Chip ‘Athena’ at Ignite Conference

Architects' Tech Alliance

Oct 6, 2023 · Fundamentals

High Bandwidth Memory (HBM) Technology Overview and Its Integration in Modern Processors

High Bandwidth Memory (HBM), introduced in 2014 using TSV stacking, has evolved through HBM2, HBM2e, and HBM3 standards and is now integrated into CPUs, GPUs, and accelerators from AMD, NVIDIA, Intel, and others, with advanced interconnects like CoWoS, EMIB, and Foveros enabling high‑capacity, high‑bandwidth packaging.

CPUGPUHBM

0 likes · 16 min read

High Bandwidth Memory (HBM) Technology Overview and Its Integration in Modern Processors

Architects' Tech Alliance

Sep 29, 2023 · Artificial Intelligence

AI Compute Landscape: GPUs, Networking, and Storage

The article analyzes the AI compute ecosystem—highlighting GPUs as the core engine, network bandwidth as a bottleneck, and storage memory walls—while also promoting comprehensive server and storage e‑books for deeper technical insight.

AIComputeE‑book

0 likes · 4 min read

AI Compute Landscape: GPUs, Networking, and Storage

DaTaobao Tech

Sep 27, 2023 · Artificial Intelligence

FlashAttention-2: Efficient Attention Algorithm for Transformer Acceleration and AIGC Applications

FlashAttention‑2 is an IO‑aware exact attention algorithm that cuts GPU HBM traffic through tiling and recomputation, optimizes non‑matmul FLOPs, expands sequence‑parallelism and warp‑level work distribution, delivering up to 2× speedup over FlashAttention, near‑GEMM efficiency, and enabling longer‑context Transformer training and inference for AIGC with fastunet and negligible accuracy loss.

AIGCAttention optimizationDeep Learning

0 likes · 20 min read

FlashAttention-2: Efficient Attention Algorithm for Transformer Acceleration and AIGC Applications

Architects' Tech Alliance

Sep 17, 2023 · Fundamentals

FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages

This article provides a comprehensive overview of FPGA technology, detailing its programmable logic cells, input/output blocks, switch matrices, historical evolution, flexibility versus ASIC and GPU, memory hierarchy including on‑chip and HBM2e, and the benefits of Network‑on‑Chip architectures for performance, power and design modularity.

ASICFPGAGPU

0 likes · 12 min read

FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages

Baobao Algorithm Notes

Sep 12, 2023 · Artificial Intelligence

Why RTX 4090 Beats H100 for LLM Inference but Fails at Training

The article analyses the performance, memory, bandwidth and cost of NVIDIA H100, A100 and RTX 4090 GPUs, explains why the 4090 cannot handle large‑model training due to communication and memory limits, and shows how its high compute‑to‑price ratio makes it attractive for inference, backed by detailed parallelism calculations and cost‑per‑token estimates.

CostGPULLM

0 likes · 46 min read

Why RTX 4090 Beats H100 for LLM Inference but Fails at Training

Architects' Tech Alliance

Sep 7, 2023 · Artificial Intelligence

Global AI Accelerator Chip Market Overview and Emerging Chinese Vendors (2023)

The article provides a comprehensive analysis of the AI accelerator chip market, highlighting the dominant position of overseas leaders like Nvidia, AMD and Intel, detailing market share data, and examining the rapid development and competitive strategies of emerging Chinese GPU, GPGPU, and ASIC manufacturers.

AIAcceleratorChip

0 likes · 15 min read

Architects' Tech Alliance

Sep 4, 2023 · Artificial Intelligence

Overview of AI Chip Types, Architectures, and Market Trends

The article explains the various AI‑capable chips such as CPUs, GPUs, FPGAs, NPUs, and TPUs, compares their performance and efficiency, describes heterogeneous CPU+xPU solutions, and provides market share data while highlighting the growing adoption of specialized AI accelerators.

AI accelerationAI chipsCPU

0 likes · 7 min read

Overview of AI Chip Types, Architectures, and Market Trends

Architects' Tech Alliance

Sep 2, 2023 · Artificial Intelligence

NVIDIA L40S GPU Overview and Its Impact on Generative AI and Optical Modules

The NVIDIA L40S GPU, built on the Ada Lovelace architecture with 48 GB GDDR6 memory and 846 GB/s bandwidth, delivers over 1.45 PFLOPS tensor performance and superior FP16/FP32 efficiency for generative AI training and inference, while its lower power and GDDR6 design may influence demand for mid‑range optical modules in data centers.

Data centerGPUL40S

0 likes · 8 min read

NVIDIA L40S GPU Overview and Its Impact on Generative AI and Optical Modules

Open Source Linux

Aug 24, 2023 · Fundamentals

How to Read CPU, GPU, and Graphics Card Performance Ladder Charts (2023)

This article explains the key performance metrics of CPUs, GPUs, and graphics cards, shows how ladder charts are built from benchmark data, and provides practical tips for choosing the right hardware for desktops and laptops based on these rankings.

CPUGPUGraphics Card

0 likes · 5 min read

How to Read CPU, GPU, and Graphics Card Performance Ladder Charts (2023)

Architects' Tech Alliance

Aug 20, 2023 · Fundamentals

Understanding CPU, GPU, and Graphics Card Performance Rankings (2023)

This article explains how CPU core count, frequency, cache, GPU shader count, memory bandwidth, and graphics card VRAM affect performance, and presents 2023 desktop and laptop performance ladder charts to help users compare and choose suitable hardware for their needs.

BenchmarkCPUGPU

0 likes · 6 min read

Understanding CPU, GPU, and Graphics Card Performance Rankings (2023)

JD Tech

Aug 4, 2023 · Artificial Intelligence

Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine

This article details a step‑by‑step guide to deploying the Vicuna open‑source LLM on a single server, covering model preparation, environment setup, dependency installation, GPU and CUDA configuration, inference commands, performance evaluation, and attempted fine‑tuning, while sharing practical observations and results.

Fine‑tuningGPUInference

0 likes · 16 min read

Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine

Alibaba Cloud Native

Aug 3, 2023 · Cloud Native

How Koordinator + KubeDL Revolutionize AI Model Training on Kubernetes

This article explains how the open‑source Koordinator scheduler, combined with KubeDL, tackles the resource‑intensive demands of large‑scale AI and LLM training on Kubernetes by introducing heterogeneous resource management, elastic quota, coscheduling, and fine‑grained GPU & RDMA allocation.

AI trainingGPUKoordinator

0 likes · 17 min read

How Koordinator + KubeDL Revolutionize AI Model Training on Kubernetes

JD Tech

Jul 31, 2023 · Artificial Intelligence

Local Deployment, Fine‑tuning, and Inference of the Open‑source Alpaca‑LoRA Model on GPU Servers

This article details the step‑by‑step process of installing GPU drivers, setting up a Python environment, deploying the open‑source Alpaca‑LoRA large language model, fine‑tuning it with Chinese data on a multi‑GPU server, and running inference, while discussing practical challenges and performance observations.

AlpacaFine-tuningGPU

0 likes · 14 min read

Local Deployment, Fine‑tuning, and Inference of the Open‑source Alpaca‑LoRA Model on GPU Servers

Architects' Tech Alliance

Jul 29, 2023 · Artificial Intelligence

AI Server Market Overview and Technical Architecture

The article provides a comprehensive analysis of the AI server market, detailing server hardware components, cost distribution, logical architecture, firmware, rapid market growth, competitive landscape, AI-driven heterogeneous computing, and future industry trends, while highlighting key vendors and deployment configurations.

AI serversCloud providersGPU

0 likes · 10 min read

AI Server Market Overview and Technical Architecture

Liangxu Linux

Jul 20, 2023 · Fundamentals

How to Choose the Right Desktop PC Components for Your Needs

This guide explains how to select desktop computer parts—including CPU, GPU, motherboard, memory, storage, power supply, and cooling—by evaluating usage, performance tiers, specifications, brand options, and compatibility, while also noting which components can be safely bought second‑hand.

CPUGPUMemory

0 likes · 9 min read

How to Choose the Right Desktop PC Components for Your Needs

Architects' Tech Alliance

Jul 10, 2023 · Fundamentals

Aligning the PCI‑Express Roadmap with the Cadence of Compute Engines and Networks

The article argues that PCI‑Express specifications, controllers, and switches must adopt a coordinated two‑year release cadence that matches CPU, GPU, and accelerator roadmaps, urging the PCI‑SIG to accelerate to PCI‑Express 7.0 to meet the bandwidth demands of modern data‑center and AI workloads.

CPUData centerGPU

0 likes · 13 min read

Aligning the PCI‑Express Roadmap with the Cadence of Compute Engines and Networks

Architects' Tech Alliance

Jul 9, 2023 · Industry Insights

China’s GPU Landscape: Architecture, Performance, and Market Outlook

The report builds a comprehensive GPU research framework evaluating performance through micro‑architecture, process, core count and frequency, examines ecosystem dominance of CUDA, dissects NVIDIA Fermi and Hopper designs, analyzes competitive histories of Nvidia and AMD, and forecasts domestic GPU market opportunities in AI data centers, autonomous vehicles, and gaming.

AIChinaGPU

0 likes · 5 min read

China’s GPU Landscape: Architecture, Performance, and Market Outlook

MaGe Linux Operations

Jul 6, 2023 · Fundamentals

Boost Python Performance: Top Tools and Techniques for Faster Code

This article surveys a range of Python acceleration tools—from NumPy, SciPy, and Pandas for efficient array operations to JIT compilers like PyPy and Pyston, GPU libraries such as PyCUDA, and C‑extension generators like Cython—explaining how each can dramatically speed up single‑processor or parallel code while balancing memory usage.

CythonGPUnumba

0 likes · 6 min read

Boost Python Performance: Top Tools and Techniques for Faster Code

58 Tech

Jul 6, 2023 · Artificial Intelligence

Design and Optimization of a Kaldi‑Based Speech Recognition Backend at 58.com

This article details the evolution from the initial Kaldi‑based speech recognition architecture (version 1.0) to a re‑engineered version 2.0, describing business background, service components, identified shortcomings, and a series of performance, concurrency, GPU, I/O, GC, and dispatch optimizations that dramatically improve resource utilization, latency, and reliability for large‑scale voice processing at 58.com.

AIBackend ArchitectureGPU

0 likes · 15 min read

Design and Optimization of a Kaldi‑Based Speech Recognition Backend at 58.com

Open Source Linux

Jul 3, 2023 · Fundamentals

Why DPUs Are Revolutionizing Data Center Efficiency Over CPUs and GPUs

This article explains how Data Processing Units (DPUs) provide a low‑power, cost‑effective alternative to CPUs and GPUs for data‑centric workloads in modern data centers, detailing their architecture, programmability, and the performance and TCO benefits they bring.

CPUDPUData center

0 likes · 9 min read

Why DPUs Are Revolutionizing Data Center Efficiency Over CPUs and GPUs

Volcano Engine Developer Services

Jun 30, 2023 · Cloud Native

Deploy Langchain‑ChatGLM on Volcengine VKE: A Step‑by‑Step Cloud‑Native Guide

This tutorial walks you through preparing a VKE cluster, pulling the Langchain‑ChatGLM container image, creating the necessary Deployment and Service resources, and adding a local knowledge base, enabling you to run a Langchain‑based ChatGLM service with GPU support on Volcengine’s cloud‑native platform.

AI deploymentChatGLMGPU

0 likes · 6 min read

Deploy Langchain‑ChatGLM on Volcengine VKE: A Step‑by‑Step Cloud‑Native Guide

58 Tech

Jun 21, 2023 · Artificial Intelligence

GPU Hotword Enhancement for WeNet End-to-End Speech Recognition

This article explains the design, implementation, and experimental evaluation of hot‑word augmentation in WeNet's GPU runtime, detailing how character‑ and word‑based language model scoring are extended to boost recognition of rare proper nouns in both streaming and non‑streaming ASR services.

ASRCTC decoderGPU

0 likes · 12 min read

GPU Hotword Enhancement for WeNet End-to-End Speech Recognition

Architects' Tech Alliance

Jun 20, 2023 · Fundamentals

Introducing NVIDIA DOCA GPUNetIO: GPU‑Initiated Communication for Real‑Time Packet Processing

NVIDIA's new DOCA GPUNetIO library enables GPU‑initiated communication, allowing packets to be received directly into GPU memory, processed by CUDA kernels, and sent without CPU involvement, offering lower latency, higher scalability, and detailed pipeline examples including IP checksum, HTTP filtering, traffic forwarding, and 5G Aerial SDK integration.

5GCUDADOCA

0 likes · 19 min read

Introducing NVIDIA DOCA GPUNetIO: GPU‑Initiated Communication for Real‑Time Packet Processing

Architects' Tech Alliance

Jun 14, 2023 · Artificial Intelligence

AI Compute Landscape: GPUs, Network, and Storage as Core Engines

The article analyzes how large language models like ChatGPT are reshaping the software ecosystem by positioning AI compute—driven by GPUs, high‑speed networking, and advanced storage solutions such as HBM and 3D‑stacked memory—as the foundational engine for future information systems, highlighting current market trends and technical challenges.

AIComputeGPU

0 likes · 4 min read

AI Compute Landscape: GPUs, Network, and Storage as Core Engines

Python Programming Learning Circle

Jun 13, 2023 · Fundamentals

Python Performance Optimization Tools and Techniques

This article introduces a variety of Python optimization tools—including NumPy, SciPy, Pandas, JIT compilers like PyPy, GPU libraries, Cython, Numba, and interfacing utilities—explaining how they can make code more concise, faster, and better suited for single‑processor or multi‑processor execution.

GPUPythonlibraries

0 likes · 8 min read

Python Performance Optimization Tools and Techniques

Volcano Engine Developer Services

Jun 13, 2023 · Artificial Intelligence

Deploy Stable Diffusion on Volcengine Cloud: A Step‑by‑Step Guide

Learn how to deploy your own Stable Diffusion text‑to‑image model on Volcengine Cloud by setting up a VKE Kubernetes cluster, configuring storage, GPU resources, container images, and exposing the service via ALB or API Gateway, while leveraging mGPU sharing and serverless GPU options.

AIGPUKubernetes

0 likes · 14 min read

Deploy Stable Diffusion on Volcengine Cloud: A Step‑by‑Step Guide

Ctrip Technology

Jun 8, 2023 · Frontend Development

Optimizing CSS Animation Performance: Techniques and Best Practices

This article explains how to improve CSS animation performance by understanding the browser rendering pipeline, using GPU‑accelerated transforms, avoiding costly properties and complex selectors, leveraging will‑change and requestAnimationFrame, and preferring CSS over JavaScript for smoother, lower‑latency visual effects.

CSSGPUfrontend

0 likes · 13 min read

Optimizing CSS Animation Performance: Techniques and Best Practices

Architects' Tech Alliance

Jun 1, 2023 · Fundamentals

Overview of the Xinchuang (Information Technology Innovation) Industry: CPU, GPU, and Storage Fundamentals

This article provides a comprehensive overview of the Xinchuang industry, detailing the fundamental concepts, architectures, and classifications of CPUs, GPUs, and storage devices, and explains how these core hardware components support the goal of achieving self‑controlled, secure information technology in China.

CPUGPUHardware

0 likes · 6 min read

Overview of the Xinchuang (Information Technology Innovation) Industry: CPU, GPU, and Storage Fundamentals

Open Source Linux

May 29, 2023 · Fundamentals

What Is a GPU? Understanding Its Role in Graphics, AI, and Computing

This article explains what a GPU (Graphics Processing Unit) is, how it differs from a CPU, its architecture and performance characteristics, and why it powers everything from real‑time rendering to AI inference, using examples like the NVIDIA RTX 3090.

CPU comparisonGPUGraphics Processing Unit

0 likes · 8 min read

What Is a GPU? Understanding Its Role in Graphics, AI, and Computing

Alibaba Cloud Big Data AI Platform

May 29, 2023 · Artificial Intelligence

How PAI‑Blade Supercharges Stable Diffusion Inference on GPUs

This article explains how PAI‑Blade, built on the BladeDISC compiler and BlaDNN library, dramatically reduces latency and memory usage for Stable Diffusion inference, provides step‑by‑step usage examples with code, shows performance gains on A100 and A10 GPUs, and outlines future optimization directions.

GPUInference OptimizationPAI-Blade

0 likes · 9 min read

How PAI‑Blade Supercharges Stable Diffusion Inference on GPUs

Tencent Cloud Developer

May 24, 2023 · Artificial Intelligence

Deploying Stable Diffusion on Tencent Cloud: A Step‑by‑Step Guide

Deploy Stable Diffusion on Tencent Cloud by building a Docker image, pushing it to TCR, creating a GPU‑enabled TKE cluster with CFS storage, configuring qGPU sharing, exposing the service via Cloud Native API Gateway, optimizing inference with TACO Kit, storing results in COS, and applying content moderation.

AI deploymentGPUKubernetes

0 likes · 19 min read

Deploying Stable Diffusion on Tencent Cloud: A Step‑by‑Step Guide

JD Retail Technology

May 18, 2023 · Artificial Intelligence

Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model

This article details the step‑by‑step process of preparing the environment, merging weights, installing dependencies, running inference, evaluating Vicuna‑7B against other models, and attempting fine‑tuning, while highlighting performance results, encountered issues, and future work for large language model deployment.

Fine-tuningGPUInference

0 likes · 11 min read

Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model

JD Retail Technology

May 16, 2023 · Artificial Intelligence

Deploying and Fine‑Tuning the Alpaca‑LoRA Large Language Model on a Multi‑GPU Server

This guide details the end‑to‑end process of installing GPU drivers, setting up a Python environment, deploying the open‑source Alpaca‑LoRA model, fine‑tuning it with Chinese data on a multi‑GPU server, and performing inference, while highlighting practical challenges and performance observations.

Alpaca-LoRADeep LearningFine-tuning

0 likes · 11 min read

Deploying and Fine‑Tuning the Alpaca‑LoRA Large Language Model on a Multi‑GPU Server

ByteDance SYS Tech

May 12, 2023 · Fundamentals

Inside Intel GPU Render Engine: How 3D Rendering Works at the Hardware Level

This article explains the architecture and workflow of Intel's GPU render engine, covering the 3D pipeline, command streamer, fixed‑function units, execution units, URB handling, thread dispatch, shader stages, sampler state, and the Mesa driver implementation that translates OpenGL commands into hardware instructions.

GPUGraphics PipelineIntel

0 likes · 39 min read

Inside Intel GPU Render Engine: How 3D Rendering Works at the Hardware Level

Architects' Tech Alliance

May 11, 2023 · Fundamentals

Overview of ARM Processor Architectures and Their Evolution

This article provides a comprehensive overview of ARM's processor architectures—including the A, R, and M profiles—detailing the evolution of major Cortex‑A series CPUs, the big.LITTLE concept, and the Mali GPU families, while also offering references to related technical reports and resources.

ARMCPUGPU

0 likes · 13 min read

Overview of ARM Processor Architectures and Their Evolution

MaGe Linux Operations

May 9, 2023 · Fundamentals

What Makes GPUs So Powerful? A Beginner’s Guide to Graphics Processors

This article explains what a GPU is, compares its architecture to CPUs, explores its strengths in graphics rendering and AI inference, and outlines key specifications such as cores, cache, and memory, providing a clear overview for anyone curious about modern graphics processors.

AIGPUGraphics

0 likes · 7 min read

What Makes GPUs So Powerful? A Beginner’s Guide to Graphics Processors

Architects' Tech Alliance

Apr 30, 2023 · Industry Insights

Why Data Centers Need DPU: Comparing CPUs, GPUs, and Data Processing Units

The article explains how DPUs, as low‑power, high‑efficiency data‑processing units, complement CPUs and GPUs in modern data centers, reducing total cost of ownership while handling data movement, security, and analytics tasks more effectively than traditional processors.

CPUDPUData center

0 likes · 9 min read

Why Data Centers Need DPU: Comparing CPUs, GPUs, and Data Processing Units

Alibaba Cloud Native

Apr 24, 2023 · Artificial Intelligence

Deploy Stable Diffusion WebUI on Alibaba Cloud Function Compute in One Command

This guide walks you through deploying the open‑source Stable Diffusion WebUI on Alibaba Cloud Function Compute using Serverless Devs, covering prerequisites, a single‑line deployment command, configuration details, access URL, and practical tips for handling GPU rendering and cold‑start latency.

AI deploymentCloud NativeFunction Compute

0 likes · 5 min read

Deploy Stable Diffusion WebUI on Alibaba Cloud Function Compute in One Command

Top Architect

Apr 21, 2023 · Artificial Intelligence

Fine‑Tuning LLaMA‑7B with Alpaca‑LoRA to Build a Chinese ChatGPT

This article explains why and how to fine‑tune the LLaMA‑7B model using the cheap Alpaca‑LoRA approach, covering hardware requirements, dataset preparation, LoRA training, optional model merging and quantization, and provides ready‑to‑run code snippets for single‑ and multi‑GPU setups.

Alpaca-LoRAFine-tuningGPU

0 likes · 10 min read

Fine‑Tuning LLaMA‑7B with Alpaca‑LoRA to Build a Chinese ChatGPT

Alibaba Cloud Native

Apr 18, 2023 · Artificial Intelligence

How to Deploy a CPU‑Based Stable Diffusion Service on Alibaba Cloud ACK

This guide walks you through the prerequisites, step‑by‑step console and kubectl procedures, YAML configuration, and post‑deployment verification needed to run a CPU‑only Stable Diffusion model on Alibaba Cloud Container Service (ACK) and optionally switch to a GPU‑enabled version.

ACKAI Model DeploymentCPU

0 likes · 7 min read

How to Deploy a CPU‑Based Stable Diffusion Service on Alibaba Cloud ACK

ByteFE

Apr 12, 2023 · Frontend Development

Design and Refactoring of the xGis 3D Map Event System and Picking Engine

This article details the background, problems, and comprehensive refactoring plan for the xGis web‑based 3D map library, covering event classification, API design, layer interaction proxy, CPU/GPU picking implementations, performance trade‑offs, and future optimization directions.

3D mappingCPUGPU

0 likes · 22 min read

Design and Refactoring of the xGis 3D Map Event System and Picking Engine

Full-Stack DevOps & Kubernetes

Apr 5, 2023 · Cloud Native

Enable GPU Acceleration in Kubernetes with NVIDIA Device Plugin

This guide explains how to set up NVIDIA drivers, install the NVIDIA device plugin, and create a Kubernetes pod that requests GPU resources, providing step‑by‑step commands and a sample YAML manifest for GPU‑enabled workloads.

Cloud NativeDevice PluginGPU

0 likes · 4 min read

Enable GPU Acceleration in Kubernetes with NVIDIA Device Plugin

Architects' Tech Alliance

Mar 29, 2023 · Fundamentals

Stream Multiprocessor (SM) Architecture and Execution Pipeline in GPUs

This article provides a comprehensive overview of GPU stream multiprocessors, detailing their micro‑architecture, instruction fetch‑decode‑execute pipeline, SIMT/ SIMD organization, warp scheduling, scoreboard mechanisms, and techniques for handling thread divergence and deadlock in GPGPU designs.

GPUSIMTScoreboard

0 likes · 16 min read

Stream Multiprocessor (SM) Architecture and Execution Pipeline in GPUs

DataFunSummit

Mar 12, 2023 · Artificial Intelligence

PaddleBox and FeaBox: GPU‑Based Large‑Scale Sparse Model Training and Integrated Feature Extraction Frameworks at Baidu

The article introduces PaddleBox and FeaBox, two GPU‑driven frameworks designed for massive sparse DNN training and unified feature extraction, detailing their architecture, performance advantages, hardware‑software co‑design challenges, and successful deployment across Baidu's advertising systems.

FeaBoxGPUPaddleBox

0 likes · 24 min read

PaddleBox and FeaBox: GPU‑Based Large‑Scale Sparse Model Training and Integrated Feature Extraction Frameworks at Baidu

Python Programming Learning Circle

Mar 7, 2023 · Fundamentals

Accelerating Python with Numba: JIT Compilation, Decorators, and GPU Support

This article introduces Numba, a Just‑in‑Time compiler for Python that transforms functions into fast machine code using LLVM, explains why it lets you stay in pure Python, demonstrates basic @jit/@njit usage, advanced decorators, GPU execution with CUDA, and interoperability with C/C++ libraries.

CUDAGPUJIT

0 likes · 11 min read

Accelerating Python with Numba: JIT Compilation, Decorators, and GPU Support

Baidu Geek Talk

Feb 17, 2023 · Artificial Intelligence

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

PGLBox, Baidu’s GPU‑based large‑scale graph training framework, delivers up to 27× speedup over CPU clusters by fully GPU‑accelerating storage, sampling, and training, supporting billions of nodes, advanced GNN algorithms, multi‑level storage, and seamless integration of massive pretrained models.

GPULarge-Scale TrainingPGLBox

0 likes · 7 min read

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

Meituan Technology Team

Feb 9, 2023 · Backend Development

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization

Meituan Visual's engineering team tackled the common low‑GPU‑utilization bottleneck in online inference services by splitting model structures and adopting micro‑service deployment, raising GPU usage from 40% to 100% and more than tripling QPS, and then generalized the approach for other GPU‑based services.

GPUMicroservicesPerformance Optimization

0 likes · 21 min read

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization

Architects' Tech Alliance

Jan 27, 2023 · Artificial Intelligence

Challenges and Future Directions of GPU in AI Computing: A Comparison with TPU and FPGA

The article analyzes how GPUs, once dominant in accelerating AI workloads, now face limitations in precision, energy efficiency, and on‑chip networking, prompting a shift toward specialized accelerators like Google's TPU and FPGA solutions, while also exploring emerging GPU‑friendly scenarios such as VR/AR, cloud gaming, and military applications.

FPGAGPUTPU

0 likes · 11 min read

Challenges and Future Directions of GPU in AI Computing: A Comparison with TPU and FPGA

Architects' Tech Alliance

Jan 9, 2023 · Fundamentals

GPU Overview: Principles, Use Cases, Limitations, and Market Landscape

This article explains GPU fundamentals, describing its role as a graphics‑oriented co‑processor, the reasons for using GPUs and other accelerators, the tasks they excel at and those they cannot handle, and outlines current market trends and architectural trade‑offs.

GPUco‑processorhardware architecture

0 likes · 9 min read

GPU Overview: Principles, Use Cases, Limitations, and Market Landscape

Watermelon Frontend Tech Team

Jan 5, 2023 · Frontend Development

How WebGL Boosted Our Canvas Editor’s Performance and Cut Memory Use

Facing memory leaks and high CPU load in a Canvas‑based avatar editor, the team switched to WebGL, leveraging GPU parallelism, off‑screen framebuffers, and blending techniques to dramatically reduce memory consumption and CPU usage while adding features like image layering and color blending.

CanvasGPUPerformance Optimization

0 likes · 11 min read

How WebGL Boosted Our Canvas Editor’s Performance and Cut Memory Use

DataFunSummit

Jan 5, 2023 · Artificial Intelligence

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

These notes explain how GPUs address the massive data, serial dependencies, and high computational complexity of modern AI by employing three acceleration strategies—parallelism, operator fusion, and simplification—illustrated with Megatron-LM, MoE models, and practical compression techniques such as quantization, distillation, and pruning.

AIGPUMegatron

0 likes · 16 min read

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

DataFunTalk

Jan 4, 2023 · Artificial Intelligence

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

This article explains how GPUs address the massive data, serial dependencies, and high computational complexity of modern AI by employing three acceleration strategies—parallelism, operator fusion, and simplification—detailing methods such as model, pipeline, and tensor parallelism, Megatron framework, MoE models, and various model compression techniques.

AIGPUMegatron

0 likes · 17 min read

Python Programming Learning Circle

Dec 17, 2022 · Fundamentals

Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples

This article demonstrates how importing the Taichi library into Python can dramatically accelerate compute‑intensive tasks, showcasing prime counting, longest common subsequence, and reaction‑diffusion simulations with speedups up to 120× and GPU support, while providing installation and usage guidance.

GPUHigh‑performance computingPython

0 likes · 6 min read

Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples

Architects' Tech Alliance

Dec 11, 2022 · Fundamentals

Fundamentals of CPU, GPU, and Storage in the Xinchuang Industry

This article provides a comprehensive overview of the Xinchuang industry’s hardware fundamentals, detailing CPU architecture and operation, instruction set classifications, GPU concepts and workflows, storage categories, and the distinction between independent and integrated GPUs, while also noting related promotional resources.

CPUGPUHardware

0 likes · 8 min read

Fundamentals of CPU, GPU, and Storage in the Xinchuang Industry

Architects' Tech Alliance

Nov 29, 2022 · Artificial Intelligence

In‑Depth Overview of NVIDIA Grace Hopper Superchip Architecture

The article provides a comprehensive technical overview of NVIDIA's Grace Hopper Superchip, detailing its heterogeneous CPU‑GPU design, high‑bandwidth NVLink‑C2C interconnect, performance advantages for HPC and AI workloads, programming model, and the architectural innovations that enable unprecedented scalability and productivity.

AICPUGPU

0 likes · 15 min read

In‑Depth Overview of NVIDIA Grace Hopper Superchip Architecture

Tencent Cloud Developer

Nov 29, 2022 · Game Development

GPU Rendering Pipeline and Hardware Architecture Overview

The article surveys GPU rendering pipelines and hardware architectures for desktop and mobile, explains classic stages, compares Immediate Mode, Tile‑Based and Tile‑Based Deferred rendering, details PowerVR, Mali and Adreno components, and offers optimization advice on draw calls, depth pre‑passes, shader efficiency, and render ordering.

GPUGraphicsMobile GPU

0 likes · 66 min read

GPU Rendering Pipeline and Hardware Architecture Overview

Python Programming Learning Circle

Nov 15, 2022 · Fundamentals

A Comprehensive Guide to Using Numba for Python JIT Compilation

This article introduces Numba, a Python Just-in-time compiler, explains why it is advantageous over alternatives, demonstrates how to apply its decorators such as @jit, @njit, @vectorize, and @cuda for CPU and GPU acceleration, and provides practical code examples and tips for optimal performance.

CUDAGPUJIT

0 likes · 10 min read

A Comprehensive Guide to Using Numba for Python JIT Compilation

Architects' Tech Alliance

Nov 7, 2022 · Artificial Intelligence

FastDeploy: One-Click AI Model Deployment Across GPUs, CPUs, and Edge Devices

FastDeploy is an open‑source toolkit that standardizes AI model APIs and enables developers to deploy vision, NLP, and speech models on diverse hardware—including GPUs, CPUs, Jetson, ARM, and various NPUs—using just three lines of code or a single command, while delivering end‑to‑end performance optimizations.

AI deploymentCPUEdge Computing

0 likes · 11 min read

FastDeploy: One-Click AI Model Deployment Across GPUs, CPUs, and Edge Devices

Architects' Tech Alliance

Nov 1, 2022 · Databases

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

The September 2022 China Database Industry Analysis report highlights a wave of hardware‑driven innovations—including multi‑core CPUs, heterogeneous GPUs/TPUs/DPU, programmable FPGAs, CXL‑DDR5, persistent memory, NVMe‑oF, and RDMA‑based storage—that enable massive data storage and high‑concurrency real‑time computing across a range of novel database architectures and products.

GPUHardware accelerationOLTP

0 likes · 10 min read

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

Baidu Geek Talk

Oct 31, 2022 · Artificial Intelligence

PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework

PaddleBox is Baidu’s GPU‑based ultra‑large‑scale sparse DNN training framework that combines a three‑tier hierarchical parameter server (SSD, DRAM, HBM) with pipelined scheduling and multi‑machine multi‑GPU communication, delivering 5–40× cost‑performance gains over traditional CPU solutions and powering Baidu’s advertising services.

Deep LearningGPUPaddleBox

0 likes · 15 min read

PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework

OPPO Kernel Craftsman

Oct 28, 2022 · Artificial Intelligence

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ShaderNN is an open‑source, sub‑2 MB GPU‑shader inference engine that runs TensorFlow, PyTorch and ONNX models directly on mobile graphics textures via OpenGL fragment and compute shaders, delivering real‑time, low‑power AI for image‑heavy tasks while eliminating third‑party dependencies and achieving up to 90 % speed gains.

GPUInference EngineMobile AI

0 likes · 11 min read

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

Alibaba Cloud Native

Oct 10, 2022 · Cloud Native

What’s New in Koordinator v0.7? Enhanced Coscheduling, ElasticQuota, and Fine‑Grained GPU Sharing

Koordinator v0.7 adds major cloud‑native scheduling features—including enhanced gang (coscheduling) with Strict/NonStrict modes, multi‑hierarchy ElasticQuota management, fine‑grained GPU resource protocols, richer diagnostic APIs, and safer descheduling—targeting machine‑learning and big‑data workloads on Kubernetes.

Cloud NativeCoschedulingElasticQuota

0 likes · 25 min read

What’s New in Koordinator v0.7? Enhanced Coscheduling, ElasticQuota, and Fine‑Grained GPU Sharing

IT Services Circle

Sep 29, 2022 · Game Development

Building a Fully Functional 3D Minecraft Computer Inside Vanilla Minecraft

A team of three spent over ten months constructing a 1 Hz CPU, custom GPU, and complete game logic inside unmodded Minecraft using only redstone, enabling a playable 3D version of Minecraft within Minecraft itself.

CPUGPUGame Development

0 likes · 11 min read

Building a Fully Functional 3D Minecraft Computer Inside Vanilla Minecraft

Rare Earth Juejin Tech Community

Sep 29, 2022 · Fundamentals

Understanding OpenGL Buffer Objects and VBO Optimization

This article explains the concept of OpenGL objects, focuses on common buffer objects such as VBO, VAO, and EBO, describes how they reduce CPU‑GPU transfer costs, and provides detailed code examples for creating, configuring, and rendering with Vertex Buffer Objects to improve graphics performance.

Buffer ObjectsGPUOpenGL

0 likes · 18 min read

Understanding OpenGL Buffer Objects and VBO Optimization

ELab Team

Sep 28, 2022 · Frontend Development

Master WebGL & Three.js: From Basics to 3D Rendering in the Browser

This article guides beginners through the fundamentals of computer graphics, explaining OpenGL, WebGL, GLSL, and the rendering pipeline, then demonstrates practical Three.js code for setting up scenes, cameras, lights, materials, and textures to create interactive 3D web experiences.

3D renderingGPUGraphics

0 likes · 20 min read

Master WebGL & Three.js: From Basics to 3D Rendering in the Browser

Architects' Tech Alliance

Sep 10, 2022 · Fundamentals

Overview of NVIDIA DOCA and SmartNIC/DPU Technologies

This article provides a comprehensive overview of NVIDIA's DOCA framework, BlueField DPU architecture, SDK components, programming models, and related technologies such as RDMA, RoCE, and GPUDirect RDMA, highlighting their roles in modern data‑center acceleration and security.

DOCADPUGPU

0 likes · 8 min read

Overview of NVIDIA DOCA and SmartNIC/DPU Technologies

Architects' Tech Alliance

Jul 21, 2022 · Artificial Intelligence

The Evolution of CPU and Heterogeneous Computing Architecture in the AI Era

This article surveys the rapid growth of data‑center capacity, the rise of AI and big‑data workloads, and how emerging accelerators such as GPUs, DPUs, SmartNICs and heterogeneous CPU designs from Intel, AMD, Arm and Apple are reshaping server hardware and driving a new wave of performance and efficiency competition.

AICPUData center

0 likes · 12 min read

The Evolution of CPU and Heterogeneous Computing Architecture in the AI Era

NetEase Cloud Music Tech Team

Jul 6, 2022 · Industry Insights

Inside NetEase Cloud Music’s MLOps: Scaling AI with VK, ECI, and Ceph

This article details NetEase Cloud Music’s four‑layer machine‑learning platform architecture, covering resource provisioning with Visual Kubelet and Alibaba Cloud ECI, Ceph storage optimizations, TensorFlow migration, large‑scale graph neural network support, and end‑to‑end workflow tooling that together enable efficient, cost‑effective AI development and deployment.

CephGPUGraph Neural Network

0 likes · 24 min read

Inside NetEase Cloud Music’s MLOps: Scaling AI with VK, ECI, and Ceph

Youku Technology

Jun 9, 2022 · Mobile Development

Design and Architecture of the Cross-Platform Multimedia Rendering Engine OPR

The OPR engine provides a cross‑platform, GPU‑accelerated rendering framework that unifies audio‑video pre‑ and post‑processing, native UI‑driven danmaku rendering, and real‑time visual effects such as human‑body recognition, using a modular command‑stream architecture, C++ core, monitoring tools, and extensibility for future Vulkan, VR, and plugin integration.

GPUNative UIReal-Time

0 likes · 15 min read

Design and Architecture of the Cross-Platform Multimedia Rendering Engine OPR

Youku Technology

Jun 8, 2022 · Mobile Development

How Youku Achieves Real-Time Bullet‑Screen Pass‑Through on Mobile

This article details Youku's technical approach to rendering bullet‑screen pass‑through on mobile devices, covering cloud‑based and on‑device segmentation pipelines, GPU‑accelerated rendering steps, performance optimizations, and engineering challenges to deliver seamless immersive viewing.

GPUMetalOpenGL

0 likes · 11 min read

How Youku Achieves Real-Time Bullet‑Screen Pass‑Through on Mobile

Shopee Tech Team

Jun 2, 2022 · Backend Development

Applying GPU Technology for High‑Throughput Image Rendering in Shopee Off‑Platform Ads

The Shopee Off‑Platform Ads team built a GPU‑accelerated Creative Rendering System that uses a four‑layer architecture, CGO‑bridged C/C++ kernels, and template caching to process billions of product images daily, achieving roughly ten‑fold speedup, half the cost, and far reduced rack space while handling high concurrency.

AdvertisingCUDAGPU

0 likes · 23 min read

Applying GPU Technology for High‑Throughput Image Rendering in Shopee Off‑Platform Ads

Architects' Tech Alliance

May 31, 2022 · Fundamentals

AMD’s Next‑Gen Navi 31 GPU Is Likely a Single‑Chip Design, Not a Multi‑Chiplet Monster

Recent analysis suggests that AMD’s upcoming top‑tier RDNA 3 GPU, the Navi 31, will abandon the rumored multi‑chiplet architecture in favor of a single, powerful compute die, reducing shader count and TFLOP ratings while still promising strong performance for gaming and data‑center workloads.

AMDGPUGaming

0 likes · 7 min read

AMD’s Next‑Gen Navi 31 GPU Is Likely a Single‑Chip Design, Not a Multi‑Chiplet Monster

Baidu Geek Talk

May 30, 2022 · Mobile Development

Advanced OpenCL Optimization Techniques for Qualcomm Adreno GPUs on Mobile Devices

The article presents advanced OpenCL optimization techniques for Qualcomm Adreno mobile GPUs, explaining the programming model, profiling methods, bottleneck identification, and kernel‑level strategies such as fast math, fp16, vectorized memory accesses, and hardware‑specific features to improve compute‑ and memory‑bound performance on Android devices.

AdrenoGPUMobile Computing

0 likes · 12 min read

Advanced OpenCL Optimization Techniques for Qualcomm Adreno GPUs on Mobile Devices

Architects' Tech Alliance

May 23, 2022 · Industry Insights

GPU Wars in the Data Center: How Nvidia, AMD, and Intel Compete for AI and HPC Dominance

The article examines how GPUs have evolved from gaming accelerators to essential data‑center processors for AI, HPC, and scientific workloads, and compares the latest server‑grade offerings from Nvidia, AMD, and Intel—including performance specs, memory technologies, interconnects, and software ecosystems—highlighting the fierce competition shaping the future of compute.

AIAMDData center

0 likes · 12 min read

GPU Wars in the Data Center: How Nvidia, AMD, and Intel Compete for AI and HPC Dominance

MaGe Linux Operations

May 20, 2022 · Operations

Why NVIDIA’s Open‑Source Linux GPU Kernel Driver Is a Game‑Changer

NVIDIA has finally open‑sourced its Linux GPU kernel driver, a landmark move that promises tighter OS integration, easier debugging, and broader support for Turing and Ampere GPUs, while also reshaping the relationship between proprietary drivers, the Nouveau project, and major Linux distributions.

CUDAGPUNouveau

0 likes · 9 min read

Why NVIDIA’s Open‑Source Linux GPU Kernel Driver Is a Game‑Changer

ByteFE

May 18, 2022 · Frontend Development

Understanding WebGL: GPU Basics, Shaders, and Practical Code Examples

This article introduces WebGL fundamentals for frontend developers, explaining GPU versus CPU, GLSL shaders, and how JavaScript prepares data, followed by step‑by‑step code examples of fragment and vertex shaders, custom primitives, and using the gl‑renderer library to render graphics.

GPUGraphicsJavaScript

0 likes · 11 min read

Understanding WebGL: GPU Basics, Shaders, and Practical Code Examples

Alibaba Terminal Technology

May 17, 2022 · Frontend Development

Unlock 20‑30× GPU Speed: WebGPU in Three.js, Babylon.js, and TensorFlow.js

This article introduces WebGPU—a powerful yet still experimental web graphics API—showing how major frameworks like Three.js and Babylon.js adopt it for high‑performance 3D rendering, how TensorFlow.js leverages it for massive deep‑learning speedups, and provides hands‑on code examples from framework usage to raw WebGPU programming.

Babylon.jsGPUGraphics

0 likes · 17 min read

Unlock 20‑30× GPU Speed: WebGPU in Three.js, Babylon.js, and TensorFlow.js

Tencent Cloud Developer

May 12, 2022 · Backend Development

Practical Guide to PyTorch Distributed Training: DP, DDP, Groups, and IO Considerations

This guide explains PyTorch’s distributed training, contrasting single‑node DataParallel with multi‑node DistributedDataParallel, detailing essential parameters, group communication setup, proper use of DistributedSampler for data loading, handling IO bottlenecks, and avoiding common pitfalls such as memory imbalance, unsynchronized buffers, and unused‑parameter errors.

DDPDataParallelDistributed Training

0 likes · 15 min read

Practical Guide to PyTorch Distributed Training: DP, DDP, Groups, and IO Considerations

Architects' Tech Alliance

May 10, 2022 · Industry Insights

What to Expect from Nvidia’s RTX 4000 and AMD’s RDNA 3 GPUs in Late 2022?

Based on recent leaks from 3DCenter.org and two prominent Twitter insiders, this analysis predicts launch windows, product codes, performance tiers, pricing pressures and power‑draw concerns for Nvidia’s upcoming RTX 4000 series and AMD’s RDNA 3 GPUs, while warning that the information remains speculative.

AMDGPULovelace

0 likes · 7 min read

What to Expect from Nvidia’s RTX 4000 and AMD’s RDNA 3 GPUs in Late 2022?

Architects' Tech Alliance

May 4, 2022 · Industry Insights

What the Next‑Gen Nvidia and AMD GPUs Could Mean for the 2022‑2023 Market

Based on recent leaks from 3DCenter.org and Twitter insiders Kopite7kimi and 暴龙兽55, the article forecasts Nvidia's Lovelace RTX 4000 series and AMD's RDNA 3 Navi 33/32 GPUs to launch between September 2022 and early 2023, analyzes their expected specifications, pricing dynamics, and potential market impact, and notes Intel's upcoming Arc cards as a wildcard.

AMDGPULovelace

0 likes · 7 min read

What the Next‑Gen Nvidia and AMD GPUs Could Mean for the 2022‑2023 Market

Code DAO

Apr 25, 2022 · Artificial Intelligence

How to Build a GPU‑Accelerated Jupyter Notebook Server with Docker for Google Colab

This guide walks through setting up Docker on a Windows or Linux host, enabling Nvidia GPU support via the Container Toolkit, pulling a TensorFlow GPU image, launching a Jupyter Notebook server inside the container, and connecting it to Google Colab for deep‑learning training.

DockerGPUGoogle Colab

0 likes · 11 min read

How to Build a GPU‑Accelerated Jupyter Notebook Server with Docker for Google Colab

Architects' Tech Alliance

Apr 24, 2022 · Artificial Intelligence

Analysis of NVIDIA Ada Lovelace GPU Leaked Specifications and Architecture

This article provides a detailed technical analysis of the leaked specifications for NVIDIA's Ada Lovelace GPU series, covering chip dimensions, L2 cache innovations, performance expectations, manufacturing node implications, and cost considerations while comparing them to AMD's offerings.

Ada LovelaceChip SizeGPU

0 likes · 15 min read

Analysis of NVIDIA Ada Lovelace GPU Leaked Specifications and Architecture

Architects' Tech Alliance

Apr 19, 2022 · Artificial Intelligence

Overview of AI Chip Development, Architectures, and Market Trends in China (2022)

The article provides a comprehensive overview of AI chip technology, describing the dependence on mathematical models and semiconductor integration, classifying chips by architecture (GPU, FPGA, ASIC, SoC, brain‑like), deployment (cloud, edge, terminal), and outlining current challenges, market trends, and future research directions such as in‑memory and neuromorphic computing.

AI ChipASICFPGA

0 likes · 11 min read

Overview of AI Chip Development, Architectures, and Market Trends in China (2022)

IT Services Circle

Apr 8, 2022 · Fundamentals

The Rise of Domestic GPUs in China: IP Licensing, Imagination Technologies, and Market Dynamics

Chinese domestic GPU development has accelerated rapidly, driven by fast‑track product launches, strategic IP licensing from firms like Imagination Technologies, and supportive policies, while industry players navigate challenges of patents, design complexity, and market competition to bring full‑function GPUs to market.

ChinaChip DesignGPU

0 likes · 12 min read

The Rise of Domestic GPUs in China: IP Licensing, Imagination Technologies, and Market Dynamics

DataFunSummit

Apr 7, 2022 · Artificial Intelligence

Optimizing Distributed Machine Learning Training on Google Cloud Vertex AI: Fast Socket and Reduction Server

This article explains how Google Cloud Vertex AI improves large‑scale distributed machine learning training performance by addressing the memory‑wall challenge with Fast Socket network stack enhancements for NCCL and a Reduction Server that accelerates gradient aggregation, delivering higher throughput and lower TCO for AI workloads.

Cloud AIDistributed TrainingFast Socket

0 likes · 19 min read

Optimizing Distributed Machine Learning Training on Google Cloud Vertex AI: Fast Socket and Reduction Server

Baidu App Technology

Apr 1, 2022 · Fundamentals

Mastering Mobile OpenCL on Qualcomm Adreno: Architecture & Performance Tips

This article explains OpenCL fundamentals, the Qualcomm Adreno GPU architecture, compatibility considerations, and practical optimization techniques—including profiling, bottleneck identification, and CPU‑to‑GPU conversion tips—to help developers write high‑performance mobile OpenCL code.

AdrenoGPUMobile Computing

0 likes · 13 min read

Mastering Mobile OpenCL on Qualcomm Adreno: Architecture & Performance Tips

Python Programming Learning Circle

Mar 31, 2022 · Artificial Intelligence

Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices

This article provides a thorough collection of commonly used PyTorch code snippets covering environment setup, reproducibility, GPU configuration, tensor manipulation, model building, data preprocessing, training and evaluation loops, custom loss functions, regularization techniques, learning‑rate scheduling, checkpointing, and practical tips for efficient deep‑learning development.

Deep LearningGPUModel Training

0 likes · 37 min read

Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices