Tagged articles
536 articles
Page 4 of 6
Python Programming Learning Circle
Python Programming Learning Circle
Dec 21, 2023 · Artificial Intelligence

Introducing Streamlit: A Free Open‑Source Framework for Building Machine‑Learning Apps with Python

Streamlit is a free, open‑source Python framework that lets machine‑learning engineers quickly turn scripts into interactive web apps, featuring top‑to‑bottom script execution, widget‑as‑variable handling, caching, GPU support, and seamless integration with tools like Git.

App DevelopmentGPUPython
0 likes · 9 min read
Introducing Streamlit: A Free Open‑Source Framework for Building Machine‑Learning Apps with Python
Architects' Tech Alliance
Architects' Tech Alliance
Dec 3, 2023 · Artificial Intelligence

Overview of the AI Chip Market: Architectures, Companies, and Performance Comparisons

The rapidly growing multi‑billion‑dollar AI chip market in 2023 is categorized by architecture (GPGPU, FPGA, ASIC, compute‑in‑memory) and deployment location (cloud, edge, terminal), with Chinese vendors advancing training and inference chips but still lagging behind leading Nvidia products in performance and bandwidth.

AI chipsASICChina AI
0 likes · 8 min read
Overview of the AI Chip Market: Architectures, Companies, and Performance Comparisons
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Nov 27, 2023 · Mobile Development

Understanding Android HWUI, Skia, and OpenGL Rendering Pipeline

The article explains Android’s graphics pipeline by detailing how HWUI and Skia translate view operations into OpenGL ES commands, describing RenderThread stages such as synchronization, dirty‑region calculation, buffer handling, and drawing, and comparing mobile GPU architectures like TBR, TBDR, and IMR.

AndroidGPUOpenGL
0 likes · 13 min read
Understanding Android HWUI, Skia, and OpenGL Rendering Pipeline
Architects' Tech Alliance
Architects' Tech Alliance
Oct 15, 2023 · Fundamentals

2023 GPU Graphics Card Industry Report: Market Overview, Mining Impact, and Future Trends

The 2023 GPU graphics‑card report provides a comprehensive overview of China's graphics‑card industry, detailing current development, OEM product lines, the evolution of mining, post‑Ethereum‑merge market shifts, full‑chain analysis, and forecasts for downstream demand across gaming, AI, data‑center, and automotive applications.

AIChinaGPU
0 likes · 15 min read
2023 GPU Graphics Card Industry Report: Market Overview, Mining Impact, and Future Trends
Architects' Tech Alliance
Architects' Tech Alliance
Oct 15, 2023 · Fundamentals

What You Need to Know About Server CPUs, GPUs, and Memory

This article provides a concise technical overview of server hardware, covering CPU architecture and platform options, GPU evolution and key specifications, and DDR4 memory compatibility rules, helping readers understand the essential components for building or upgrading a server.

CPUGPUMemory
0 likes · 6 min read
What You Need to Know About Server CPUs, GPUs, and Memory
php Courses
php Courses
Oct 10, 2023 · Artificial Intelligence

Microsoft to Unveil Its Own AI Chip "Athena" at Ignite Conference

Microsoft plans to announce its self‑developed AI processor, codenamed Athena, at the Ignite developer conference in mid‑November, aiming to reduce reliance on Nvidia GPUs and strengthen its AI services such as Azure AI, Bing Chat, and Copilot.

AI ChipGPUHardware
0 likes · 2 min read
Microsoft to Unveil Its Own AI Chip "Athena" at Ignite Conference
Architects' Tech Alliance
Architects' Tech Alliance
Sep 29, 2023 · Artificial Intelligence

AI Compute Landscape: GPUs, Networking, and Storage

The article analyzes the AI compute ecosystem—highlighting GPUs as the core engine, network bandwidth as a bottleneck, and storage memory walls—while also promoting comprehensive server and storage e‑books for deeper technical insight.

AIComputeE‑book
0 likes · 4 min read
AI Compute Landscape: GPUs, Networking, and Storage
DaTaobao Tech
DaTaobao Tech
Sep 27, 2023 · Artificial Intelligence

FlashAttention-2: Efficient Attention Algorithm for Transformer Acceleration and AIGC Applications

FlashAttention‑2 is an IO‑aware exact attention algorithm that cuts GPU HBM traffic through tiling and recomputation, optimizes non‑matmul FLOPs, expands sequence‑parallelism and warp‑level work distribution, delivering up to 2× speedup over FlashAttention, near‑GEMM efficiency, and enabling longer‑context Transformer training and inference for AIGC with fastunet and negligible accuracy loss.

AIGCAttention optimizationDeep Learning
0 likes · 20 min read
FlashAttention-2: Efficient Attention Algorithm for Transformer Acceleration and AIGC Applications
Architects' Tech Alliance
Architects' Tech Alliance
Sep 17, 2023 · Fundamentals

FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages

This article provides a comprehensive overview of FPGA technology, detailing its programmable logic cells, input/output blocks, switch matrices, historical evolution, flexibility versus ASIC and GPU, memory hierarchy including on‑chip and HBM2e, and the benefits of Network‑on‑Chip architectures for performance, power and design modularity.

ASICFPGAGPU
0 likes · 12 min read
FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 12, 2023 · Artificial Intelligence

Why RTX 4090 Beats H100 for LLM Inference but Fails at Training

The article analyses the performance, memory, bandwidth and cost of NVIDIA H100, A100 and RTX 4090 GPUs, explains why the 4090 cannot handle large‑model training due to communication and memory limits, and shows how its high compute‑to‑price ratio makes it attractive for inference, backed by detailed parallelism calculations and cost‑per‑token estimates.

CostGPULLM
0 likes · 46 min read
Why RTX 4090 Beats H100 for LLM Inference but Fails at Training
Architects' Tech Alliance
Architects' Tech Alliance
Sep 4, 2023 · Artificial Intelligence

Overview of AI Chip Types, Architectures, and Market Trends

The article explains the various AI‑capable chips such as CPUs, GPUs, FPGAs, NPUs, and TPUs, compares their performance and efficiency, describes heterogeneous CPU+xPU solutions, and provides market share data while highlighting the growing adoption of specialized AI accelerators.

AI accelerationAI chipsCPU
0 likes · 7 min read
Overview of AI Chip Types, Architectures, and Market Trends
Architects' Tech Alliance
Architects' Tech Alliance
Sep 2, 2023 · Artificial Intelligence

NVIDIA L40S GPU Overview and Its Impact on Generative AI and Optical Modules

The NVIDIA L40S GPU, built on the Ada Lovelace architecture with 48 GB GDDR6 memory and 846 GB/s bandwidth, delivers over 1.45 PFLOPS tensor performance and superior FP16/FP32 efficiency for generative AI training and inference, while its lower power and GDDR6 design may influence demand for mid‑range optical modules in data centers.

Data centerGPUL40S
0 likes · 8 min read
NVIDIA L40S GPU Overview and Its Impact on Generative AI and Optical Modules
JD Tech
JD Tech
Aug 4, 2023 · Artificial Intelligence

Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine

This article details a step‑by‑step guide to deploying the Vicuna open‑source LLM on a single server, covering model preparation, environment setup, dependency installation, GPU and CUDA configuration, inference commands, performance evaluation, and attempted fine‑tuning, while sharing practical observations and results.

Fine‑tuningGPUInference
0 likes · 16 min read
Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine
JD Tech
JD Tech
Jul 31, 2023 · Artificial Intelligence

Local Deployment, Fine‑tuning, and Inference of the Open‑source Alpaca‑LoRA Model on GPU Servers

This article details the step‑by‑step process of installing GPU drivers, setting up a Python environment, deploying the open‑source Alpaca‑LoRA large language model, fine‑tuning it with Chinese data on a multi‑GPU server, and running inference, while discussing practical challenges and performance observations.

AlpacaFine-tuningGPU
0 likes · 14 min read
Local Deployment, Fine‑tuning, and Inference of the Open‑source Alpaca‑LoRA Model on GPU Servers
Architects' Tech Alliance
Architects' Tech Alliance
Jul 29, 2023 · Artificial Intelligence

AI Server Market Overview and Technical Architecture

The article provides a comprehensive analysis of the AI server market, detailing server hardware components, cost distribution, logical architecture, firmware, rapid market growth, competitive landscape, AI-driven heterogeneous computing, and future industry trends, while highlighting key vendors and deployment configurations.

AI serversCloud providersGPU
0 likes · 10 min read
AI Server Market Overview and Technical Architecture
Liangxu Linux
Liangxu Linux
Jul 20, 2023 · Fundamentals

How to Choose the Right Desktop PC Components for Your Needs

This guide explains how to select desktop computer parts—including CPU, GPU, motherboard, memory, storage, power supply, and cooling—by evaluating usage, performance tiers, specifications, brand options, and compatibility, while also noting which components can be safely bought second‑hand.

CPUGPUMemory
0 likes · 9 min read
How to Choose the Right Desktop PC Components for Your Needs
Architects' Tech Alliance
Architects' Tech Alliance
Jul 9, 2023 · Industry Insights

China’s GPU Landscape: Architecture, Performance, and Market Outlook

The report builds a comprehensive GPU research framework evaluating performance through micro‑architecture, process, core count and frequency, examines ecosystem dominance of CUDA, dissects NVIDIA Fermi and Hopper designs, analyzes competitive histories of Nvidia and AMD, and forecasts domestic GPU market opportunities in AI data centers, autonomous vehicles, and gaming.

AIChinaGPU
0 likes · 5 min read
China’s GPU Landscape: Architecture, Performance, and Market Outlook
MaGe Linux Operations
MaGe Linux Operations
Jul 6, 2023 · Fundamentals

Boost Python Performance: Top Tools and Techniques for Faster Code

This article surveys a range of Python acceleration tools—from NumPy, SciPy, and Pandas for efficient array operations to JIT compilers like PyPy and Pyston, GPU libraries such as PyCUDA, and C‑extension generators like Cython—explaining how each can dramatically speed up single‑processor or parallel code while balancing memory usage.

CythonGPUnumba
0 likes · 6 min read
Boost Python Performance: Top Tools and Techniques for Faster Code
58 Tech
58 Tech
Jul 6, 2023 · Artificial Intelligence

Design and Optimization of a Kaldi‑Based Speech Recognition Backend at 58.com

This article details the evolution from the initial Kaldi‑based speech recognition architecture (version 1.0) to a re‑engineered version 2.0, describing business background, service components, identified shortcomings, and a series of performance, concurrency, GPU, I/O, GC, and dispatch optimizations that dramatically improve resource utilization, latency, and reliability for large‑scale voice processing at 58.com.

AIBackend ArchitectureGPU
0 likes · 15 min read
Design and Optimization of a Kaldi‑Based Speech Recognition Backend at 58.com
Volcano Engine Developer Services
Volcano Engine Developer Services
Jun 30, 2023 · Cloud Native

Deploy Langchain‑ChatGLM on Volcengine VKE: A Step‑by‑Step Cloud‑Native Guide

This tutorial walks you through preparing a VKE cluster, pulling the Langchain‑ChatGLM container image, creating the necessary Deployment and Service resources, and adding a local knowledge base, enabling you to run a Langchain‑based ChatGLM service with GPU support on Volcengine’s cloud‑native platform.

AI deploymentChatGLMGPU
0 likes · 6 min read
Deploy Langchain‑ChatGLM on Volcengine VKE: A Step‑by‑Step Cloud‑Native Guide
58 Tech
58 Tech
Jun 21, 2023 · Artificial Intelligence

GPU Hotword Enhancement for WeNet End-to-End Speech Recognition

This article explains the design, implementation, and experimental evaluation of hot‑word augmentation in WeNet's GPU runtime, detailing how character‑ and word‑based language model scoring are extended to boost recognition of rare proper nouns in both streaming and non‑streaming ASR services.

ASRCTC decoderGPU
0 likes · 12 min read
GPU Hotword Enhancement for WeNet End-to-End Speech Recognition
Architects' Tech Alliance
Architects' Tech Alliance
Jun 20, 2023 · Fundamentals

Introducing NVIDIA DOCA GPUNetIO: GPU‑Initiated Communication for Real‑Time Packet Processing

NVIDIA's new DOCA GPUNetIO library enables GPU‑initiated communication, allowing packets to be received directly into GPU memory, processed by CUDA kernels, and sent without CPU involvement, offering lower latency, higher scalability, and detailed pipeline examples including IP checksum, HTTP filtering, traffic forwarding, and 5G Aerial SDK integration.

5GCUDADOCA
0 likes · 19 min read
Introducing NVIDIA DOCA GPUNetIO: GPU‑Initiated Communication for Real‑Time Packet Processing
Architects' Tech Alliance
Architects' Tech Alliance
Jun 14, 2023 · Artificial Intelligence

AI Compute Landscape: GPUs, Network, and Storage as Core Engines

The article analyzes how large language models like ChatGPT are reshaping the software ecosystem by positioning AI compute—driven by GPUs, high‑speed networking, and advanced storage solutions such as HBM and 3D‑stacked memory—as the foundational engine for future information systems, highlighting current market trends and technical challenges.

AIComputeGPU
0 likes · 4 min read
AI Compute Landscape: GPUs, Network, and Storage as Core Engines
Python Programming Learning Circle
Python Programming Learning Circle
Jun 13, 2023 · Fundamentals

Python Performance Optimization Tools and Techniques

This article introduces a variety of Python optimization tools—including NumPy, SciPy, Pandas, JIT compilers like PyPy, GPU libraries, Cython, Numba, and interfacing utilities—explaining how they can make code more concise, faster, and better suited for single‑processor or multi‑processor execution.

GPUPythonlibraries
0 likes · 8 min read
Python Performance Optimization Tools and Techniques
Ctrip Technology
Ctrip Technology
Jun 8, 2023 · Frontend Development

Optimizing CSS Animation Performance: Techniques and Best Practices

This article explains how to improve CSS animation performance by understanding the browser rendering pipeline, using GPU‑accelerated transforms, avoiding costly properties and complex selectors, leveraging will‑change and requestAnimationFrame, and preferring CSS over JavaScript for smoother, lower‑latency visual effects.

CSSGPUfrontend
0 likes · 13 min read
Optimizing CSS Animation Performance: Techniques and Best Practices
Architects' Tech Alliance
Architects' Tech Alliance
Jun 1, 2023 · Fundamentals

Overview of the Xinchuang (Information Technology Innovation) Industry: CPU, GPU, and Storage Fundamentals

This article provides a comprehensive overview of the Xinchuang industry, detailing the fundamental concepts, architectures, and classifications of CPUs, GPUs, and storage devices, and explains how these core hardware components support the goal of achieving self‑controlled, secure information technology in China.

CPUGPUHardware
0 likes · 6 min read
Overview of the Xinchuang (Information Technology Innovation) Industry: CPU, GPU, and Storage Fundamentals
Open Source Linux
Open Source Linux
May 29, 2023 · Fundamentals

What Is a GPU? Understanding Its Role in Graphics, AI, and Computing

This article explains what a GPU (Graphics Processing Unit) is, how it differs from a CPU, its architecture and performance characteristics, and why it powers everything from real‑time rendering to AI inference, using examples like the NVIDIA RTX 3090.

CPU comparisonGPUGraphics Processing Unit
0 likes · 8 min read
What Is a GPU? Understanding Its Role in Graphics, AI, and Computing
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 29, 2023 · Artificial Intelligence

How PAI‑Blade Supercharges Stable Diffusion Inference on GPUs

This article explains how PAI‑Blade, built on the BladeDISC compiler and BlaDNN library, dramatically reduces latency and memory usage for Stable Diffusion inference, provides step‑by‑step usage examples with code, shows performance gains on A100 and A10 GPUs, and outlines future optimization directions.

GPUInference OptimizationPAI-Blade
0 likes · 9 min read
How PAI‑Blade Supercharges Stable Diffusion Inference on GPUs
Tencent Cloud Developer
Tencent Cloud Developer
May 24, 2023 · Artificial Intelligence

Deploying Stable Diffusion on Tencent Cloud: A Step‑by‑Step Guide

Deploy Stable Diffusion on Tencent Cloud by building a Docker image, pushing it to TCR, creating a GPU‑enabled TKE cluster with CFS storage, configuring qGPU sharing, exposing the service via Cloud Native API Gateway, optimizing inference with TACO Kit, storing results in COS, and applying content moderation.

AI deploymentGPUKubernetes
0 likes · 19 min read
Deploying Stable Diffusion on Tencent Cloud: A Step‑by‑Step Guide
JD Retail Technology
JD Retail Technology
May 18, 2023 · Artificial Intelligence

Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model

This article details the step‑by‑step process of preparing the environment, merging weights, installing dependencies, running inference, evaluating Vicuna‑7B against other models, and attempting fine‑tuning, while highlighting performance results, encountered issues, and future work for large language model deployment.

Fine-tuningGPUInference
0 likes · 11 min read
Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model
JD Retail Technology
JD Retail Technology
May 16, 2023 · Artificial Intelligence

Deploying and Fine‑Tuning the Alpaca‑LoRA Large Language Model on a Multi‑GPU Server

This guide details the end‑to‑end process of installing GPU drivers, setting up a Python environment, deploying the open‑source Alpaca‑LoRA model, fine‑tuning it with Chinese data on a multi‑GPU server, and performing inference, while highlighting practical challenges and performance observations.

Alpaca-LoRADeep LearningFine-tuning
0 likes · 11 min read
Deploying and Fine‑Tuning the Alpaca‑LoRA Large Language Model on a Multi‑GPU Server
ByteDance SYS Tech
ByteDance SYS Tech
May 12, 2023 · Fundamentals

Inside Intel GPU Render Engine: How 3D Rendering Works at the Hardware Level

This article explains the architecture and workflow of Intel's GPU render engine, covering the 3D pipeline, command streamer, fixed‑function units, execution units, URB handling, thread dispatch, shader stages, sampler state, and the Mesa driver implementation that translates OpenGL commands into hardware instructions.

GPUGraphics PipelineIntel
0 likes · 39 min read
Inside Intel GPU Render Engine: How 3D Rendering Works at the Hardware Level
Architects' Tech Alliance
Architects' Tech Alliance
May 11, 2023 · Fundamentals

Overview of ARM Processor Architectures and Their Evolution

This article provides a comprehensive overview of ARM's processor architectures—including the A, R, and M profiles—detailing the evolution of major Cortex‑A series CPUs, the big.LITTLE concept, and the Mali GPU families, while also offering references to related technical reports and resources.

ARMCPUGPU
0 likes · 13 min read
Overview of ARM Processor Architectures and Their Evolution
Alibaba Cloud Native
Alibaba Cloud Native
Apr 24, 2023 · Artificial Intelligence

Deploy Stable Diffusion WebUI on Alibaba Cloud Function Compute in One Command

This guide walks you through deploying the open‑source Stable Diffusion WebUI on Alibaba Cloud Function Compute using Serverless Devs, covering prerequisites, a single‑line deployment command, configuration details, access URL, and practical tips for handling GPU rendering and cold‑start latency.

AI deploymentCloud NativeFunction Compute
0 likes · 5 min read
Deploy Stable Diffusion WebUI on Alibaba Cloud Function Compute in One Command
Top Architect
Top Architect
Apr 21, 2023 · Artificial Intelligence

Fine‑Tuning LLaMA‑7B with Alpaca‑LoRA to Build a Chinese ChatGPT

This article explains why and how to fine‑tune the LLaMA‑7B model using the cheap Alpaca‑LoRA approach, covering hardware requirements, dataset preparation, LoRA training, optional model merging and quantization, and provides ready‑to‑run code snippets for single‑ and multi‑GPU setups.

Alpaca-LoRAFine-tuningGPU
0 likes · 10 min read
Fine‑Tuning LLaMA‑7B with Alpaca‑LoRA to Build a Chinese ChatGPT
Alibaba Cloud Native
Alibaba Cloud Native
Apr 18, 2023 · Artificial Intelligence

How to Deploy a CPU‑Based Stable Diffusion Service on Alibaba Cloud ACK

This guide walks you through the prerequisites, step‑by‑step console and kubectl procedures, YAML configuration, and post‑deployment verification needed to run a CPU‑only Stable Diffusion model on Alibaba Cloud Container Service (ACK) and optionally switch to a GPU‑enabled version.

ACKAI Model DeploymentCPU
0 likes · 7 min read
How to Deploy a CPU‑Based Stable Diffusion Service on Alibaba Cloud ACK
ByteFE
ByteFE
Apr 12, 2023 · Frontend Development

Design and Refactoring of the xGis 3D Map Event System and Picking Engine

This article details the background, problems, and comprehensive refactoring plan for the xGis web‑based 3D map library, covering event classification, API design, layer interaction proxy, CPU/GPU picking implementations, performance trade‑offs, and future optimization directions.

3D mappingCPUGPU
0 likes · 22 min read
Design and Refactoring of the xGis 3D Map Event System and Picking Engine
DataFunSummit
DataFunSummit
Mar 12, 2023 · Artificial Intelligence

PaddleBox and FeaBox: GPU‑Based Large‑Scale Sparse Model Training and Integrated Feature Extraction Frameworks at Baidu

The article introduces PaddleBox and FeaBox, two GPU‑driven frameworks designed for massive sparse DNN training and unified feature extraction, detailing their architecture, performance advantages, hardware‑software co‑design challenges, and successful deployment across Baidu's advertising systems.

FeaBoxGPUPaddleBox
0 likes · 24 min read
PaddleBox and FeaBox: GPU‑Based Large‑Scale Sparse Model Training and Integrated Feature Extraction Frameworks at Baidu
Baidu Geek Talk
Baidu Geek Talk
Feb 17, 2023 · Artificial Intelligence

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

PGLBox, Baidu’s GPU‑based large‑scale graph training framework, delivers up to 27× speedup over CPU clusters by fully GPU‑accelerating storage, sampling, and training, supporting billions of nodes, advanced GNN algorithms, multi‑level storage, and seamless integration of massive pretrained models.

GPULarge-Scale TrainingPGLBox
0 likes · 7 min read
How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning
Meituan Technology Team
Meituan Technology Team
Feb 9, 2023 · Backend Development

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization

Meituan Visual's engineering team tackled the common low‑GPU‑utilization bottleneck in online inference services by splitting model structures and adopting micro‑service deployment, raising GPU usage from 40% to 100% and more than tripling QPS, and then generalized the approach for other GPU‑based services.

GPUMicroservicesPerformance Optimization
0 likes · 21 min read
Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization
Architects' Tech Alliance
Architects' Tech Alliance
Jan 27, 2023 · Artificial Intelligence

Challenges and Future Directions of GPU in AI Computing: A Comparison with TPU and FPGA

The article analyzes how GPUs, once dominant in accelerating AI workloads, now face limitations in precision, energy efficiency, and on‑chip networking, prompting a shift toward specialized accelerators like Google's TPU and FPGA solutions, while also exploring emerging GPU‑friendly scenarios such as VR/AR, cloud gaming, and military applications.

FPGAGPUTPU
0 likes · 11 min read
Challenges and Future Directions of GPU in AI Computing: A Comparison with TPU and FPGA
DataFunSummit
DataFunSummit
Jan 5, 2023 · Artificial Intelligence

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

These notes explain how GPUs address the massive data, serial dependencies, and high computational complexity of modern AI by employing three acceleration strategies—parallelism, operator fusion, and simplification—illustrated with Megatron-LM, MoE models, and practical compression techniques such as quantization, distillation, and pruning.

AIGPUMegatron
0 likes · 16 min read
GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification
DataFunTalk
DataFunTalk
Jan 4, 2023 · Artificial Intelligence

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

This article explains how GPUs address the massive data, serial dependencies, and high computational complexity of modern AI by employing three acceleration strategies—parallelism, operator fusion, and simplification—detailing methods such as model, pipeline, and tensor parallelism, Megatron framework, MoE models, and various model compression techniques.

AIGPUMegatron
0 likes · 17 min read
GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification
Python Programming Learning Circle
Python Programming Learning Circle
Dec 17, 2022 · Fundamentals

Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples

This article demonstrates how importing the Taichi library into Python can dramatically accelerate compute‑intensive tasks, showcasing prime counting, longest common subsequence, and reaction‑diffusion simulations with speedups up to 120× and GPU support, while providing installation and usage guidance.

GPUHigh‑performance computingPython
0 likes · 6 min read
Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples
Architects' Tech Alliance
Architects' Tech Alliance
Dec 11, 2022 · Fundamentals

Fundamentals of CPU, GPU, and Storage in the Xinchuang Industry

This article provides a comprehensive overview of the Xinchuang industry’s hardware fundamentals, detailing CPU architecture and operation, instruction set classifications, GPU concepts and workflows, storage categories, and the distinction between independent and integrated GPUs, while also noting related promotional resources.

CPUGPUHardware
0 likes · 8 min read
Fundamentals of CPU, GPU, and Storage in the Xinchuang Industry
Architects' Tech Alliance
Architects' Tech Alliance
Nov 29, 2022 · Artificial Intelligence

In‑Depth Overview of NVIDIA Grace Hopper Superchip Architecture

The article provides a comprehensive technical overview of NVIDIA's Grace Hopper Superchip, detailing its heterogeneous CPU‑GPU design, high‑bandwidth NVLink‑C2C interconnect, performance advantages for HPC and AI workloads, programming model, and the architectural innovations that enable unprecedented scalability and productivity.

AICPUGPU
0 likes · 15 min read
In‑Depth Overview of NVIDIA Grace Hopper Superchip Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Nov 29, 2022 · Game Development

GPU Rendering Pipeline and Hardware Architecture Overview

The article surveys GPU rendering pipelines and hardware architectures for desktop and mobile, explains classic stages, compares Immediate Mode, Tile‑Based and Tile‑Based Deferred rendering, details PowerVR, Mali and Adreno components, and offers optimization advice on draw calls, depth pre‑passes, shader efficiency, and render ordering.

GPUGraphicsMobile GPU
0 likes · 66 min read
GPU Rendering Pipeline and Hardware Architecture Overview
Architects' Tech Alliance
Architects' Tech Alliance
Nov 7, 2022 · Artificial Intelligence

FastDeploy: One-Click AI Model Deployment Across GPUs, CPUs, and Edge Devices

FastDeploy is an open‑source toolkit that standardizes AI model APIs and enables developers to deploy vision, NLP, and speech models on diverse hardware—including GPUs, CPUs, Jetson, ARM, and various NPUs—using just three lines of code or a single command, while delivering end‑to‑end performance optimizations.

AI deploymentCPUEdge Computing
0 likes · 11 min read
FastDeploy: One-Click AI Model Deployment Across GPUs, CPUs, and Edge Devices
Architects' Tech Alliance
Architects' Tech Alliance
Nov 1, 2022 · Databases

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

The September 2022 China Database Industry Analysis report highlights a wave of hardware‑driven innovations—including multi‑core CPUs, heterogeneous GPUs/TPUs/DPU, programmable FPGAs, CXL‑DDR5, persistent memory, NVMe‑oF, and RDMA‑based storage—that enable massive data storage and high‑concurrency real‑time computing across a range of novel database architectures and products.

GPUHardware accelerationOLTP
0 likes · 10 min read
2022 China Database Industry Report: Emerging Hardware and Architectural Innovations
Baidu Geek Talk
Baidu Geek Talk
Oct 31, 2022 · Artificial Intelligence

PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework

PaddleBox is Baidu’s GPU‑based ultra‑large‑scale sparse DNN training framework that combines a three‑tier hierarchical parameter server (SSD, DRAM, HBM) with pipelined scheduling and multi‑machine multi‑GPU communication, delivering 5–40× cost‑performance gains over traditional CPU solutions and powering Baidu’s advertising services.

Deep LearningGPUPaddleBox
0 likes · 15 min read
PaddleBox: A GPU‑Based Ultra‑Large‑Scale Sparse DNN Training Framework
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Oct 28, 2022 · Artificial Intelligence

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ShaderNN is an open‑source, sub‑2 MB GPU‑shader inference engine that runs TensorFlow, PyTorch and ONNX models directly on mobile graphics textures via OpenGL fragment and compute shaders, delivering real‑time, low‑power AI for image‑heavy tasks while eliminating third‑party dependencies and achieving up to 90 % speed gains.

GPUInference EngineMobile AI
0 likes · 11 min read
ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications
Alibaba Cloud Native
Alibaba Cloud Native
Oct 10, 2022 · Cloud Native

What’s New in Koordinator v0.7? Enhanced Coscheduling, ElasticQuota, and Fine‑Grained GPU Sharing

Koordinator v0.7 adds major cloud‑native scheduling features—including enhanced gang (coscheduling) with Strict/NonStrict modes, multi‑hierarchy ElasticQuota management, fine‑grained GPU resource protocols, richer diagnostic APIs, and safer descheduling—targeting machine‑learning and big‑data workloads on Kubernetes.

Cloud NativeCoschedulingElasticQuota
0 likes · 25 min read
What’s New in Koordinator v0.7? Enhanced Coscheduling, ElasticQuota, and Fine‑Grained GPU Sharing
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 29, 2022 · Fundamentals

Understanding OpenGL Buffer Objects and VBO Optimization

This article explains the concept of OpenGL objects, focuses on common buffer objects such as VBO, VAO, and EBO, describes how they reduce CPU‑GPU transfer costs, and provides detailed code examples for creating, configuring, and rendering with Vertex Buffer Objects to improve graphics performance.

Buffer ObjectsGPUOpenGL
0 likes · 18 min read
Understanding OpenGL Buffer Objects and VBO Optimization
ELab Team
ELab Team
Sep 28, 2022 · Frontend Development

Master WebGL & Three.js: From Basics to 3D Rendering in the Browser

This article guides beginners through the fundamentals of computer graphics, explaining OpenGL, WebGL, GLSL, and the rendering pipeline, then demonstrates practical Three.js code for setting up scenes, cameras, lights, materials, and textures to create interactive 3D web experiences.

3D renderingGPUGraphics
0 likes · 20 min read
Master WebGL & Three.js: From Basics to 3D Rendering in the Browser
Architects' Tech Alliance
Architects' Tech Alliance
Sep 10, 2022 · Fundamentals

Overview of NVIDIA DOCA and SmartNIC/DPU Technologies

This article provides a comprehensive overview of NVIDIA's DOCA framework, BlueField DPU architecture, SDK components, programming models, and related technologies such as RDMA, RoCE, and GPUDirect RDMA, highlighting their roles in modern data‑center acceleration and security.

DOCADPUGPU
0 likes · 8 min read
Overview of NVIDIA DOCA and SmartNIC/DPU Technologies
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jul 6, 2022 · Industry Insights

Inside NetEase Cloud Music’s MLOps: Scaling AI with VK, ECI, and Ceph

This article details NetEase Cloud Music’s four‑layer machine‑learning platform architecture, covering resource provisioning with Visual Kubelet and Alibaba Cloud ECI, Ceph storage optimizations, TensorFlow migration, large‑scale graph neural network support, and end‑to‑end workflow tooling that together enable efficient, cost‑effective AI development and deployment.

CephGPUGraph Neural Network
0 likes · 24 min read
Inside NetEase Cloud Music’s MLOps: Scaling AI with VK, ECI, and Ceph
Youku Technology
Youku Technology
Jun 9, 2022 · Mobile Development

Design and Architecture of the Cross-Platform Multimedia Rendering Engine OPR

The OPR engine provides a cross‑platform, GPU‑accelerated rendering framework that unifies audio‑video pre‑ and post‑processing, native UI‑driven danmaku rendering, and real‑time visual effects such as human‑body recognition, using a modular command‑stream architecture, C++ core, monitoring tools, and extensibility for future Vulkan, VR, and plugin integration.

GPUNative UIReal-Time
0 likes · 15 min read
Design and Architecture of the Cross-Platform Multimedia Rendering Engine OPR
Youku Technology
Youku Technology
Jun 8, 2022 · Mobile Development

How Youku Achieves Real-Time Bullet‑Screen Pass‑Through on Mobile

This article details Youku's technical approach to rendering bullet‑screen pass‑through on mobile devices, covering cloud‑based and on‑device segmentation pipelines, GPU‑accelerated rendering steps, performance optimizations, and engineering challenges to deliver seamless immersive viewing.

GPUMetalOpenGL
0 likes · 11 min read
How Youku Achieves Real-Time Bullet‑Screen Pass‑Through on Mobile
Shopee Tech Team
Shopee Tech Team
Jun 2, 2022 · Backend Development

Applying GPU Technology for High‑Throughput Image Rendering in Shopee Off‑Platform Ads

The Shopee Off‑Platform Ads team built a GPU‑accelerated Creative Rendering System that uses a four‑layer architecture, CGO‑bridged C/C++ kernels, and template caching to process billions of product images daily, achieving roughly ten‑fold speedup, half the cost, and far reduced rack space while handling high concurrency.

AdvertisingCUDAGPU
0 likes · 23 min read
Applying GPU Technology for High‑Throughput Image Rendering in Shopee Off‑Platform Ads
Baidu Geek Talk
Baidu Geek Talk
May 30, 2022 · Mobile Development

Advanced OpenCL Optimization Techniques for Qualcomm Adreno GPUs on Mobile Devices

The article presents advanced OpenCL optimization techniques for Qualcomm Adreno mobile GPUs, explaining the programming model, profiling methods, bottleneck identification, and kernel‑level strategies such as fast math, fp16, vectorized memory accesses, and hardware‑specific features to improve compute‑ and memory‑bound performance on Android devices.

AdrenoGPUMobile Computing
0 likes · 12 min read
Advanced OpenCL Optimization Techniques for Qualcomm Adreno GPUs on Mobile Devices
Architects' Tech Alliance
Architects' Tech Alliance
May 23, 2022 · Industry Insights

GPU Wars in the Data Center: How Nvidia, AMD, and Intel Compete for AI and HPC Dominance

The article examines how GPUs have evolved from gaming accelerators to essential data‑center processors for AI, HPC, and scientific workloads, and compares the latest server‑grade offerings from Nvidia, AMD, and Intel—including performance specs, memory technologies, interconnects, and software ecosystems—highlighting the fierce competition shaping the future of compute.

AIAMDData center
0 likes · 12 min read
GPU Wars in the Data Center: How Nvidia, AMD, and Intel Compete for AI and HPC Dominance
ByteFE
ByteFE
May 18, 2022 · Frontend Development

Understanding WebGL: GPU Basics, Shaders, and Practical Code Examples

This article introduces WebGL fundamentals for frontend developers, explaining GPU versus CPU, GLSL shaders, and how JavaScript prepares data, followed by step‑by‑step code examples of fragment and vertex shaders, custom primitives, and using the gl‑renderer library to render graphics.

GPUGraphicsJavaScript
0 likes · 11 min read
Understanding WebGL: GPU Basics, Shaders, and Practical Code Examples
Alibaba Terminal Technology
Alibaba Terminal Technology
May 17, 2022 · Frontend Development

Unlock 20‑30× GPU Speed: WebGPU in Three.js, Babylon.js, and TensorFlow.js

This article introduces WebGPU—a powerful yet still experimental web graphics API—showing how major frameworks like Three.js and Babylon.js adopt it for high‑performance 3D rendering, how TensorFlow.js leverages it for massive deep‑learning speedups, and provides hands‑on code examples from framework usage to raw WebGPU programming.

Babylon.jsGPUGraphics
0 likes · 17 min read
Unlock 20‑30× GPU Speed: WebGPU in Three.js, Babylon.js, and TensorFlow.js
Tencent Cloud Developer
Tencent Cloud Developer
May 12, 2022 · Backend Development

Practical Guide to PyTorch Distributed Training: DP, DDP, Groups, and IO Considerations

This guide explains PyTorch’s distributed training, contrasting single‑node DataParallel with multi‑node DistributedDataParallel, detailing essential parameters, group communication setup, proper use of DistributedSampler for data loading, handling IO bottlenecks, and avoiding common pitfalls such as memory imbalance, unsynchronized buffers, and unused‑parameter errors.

DDPDataParallelDistributed Training
0 likes · 15 min read
Practical Guide to PyTorch Distributed Training: DP, DDP, Groups, and IO Considerations
Architects' Tech Alliance
Architects' Tech Alliance
May 4, 2022 · Industry Insights

What the Next‑Gen Nvidia and AMD GPUs Could Mean for the 2022‑2023 Market

Based on recent leaks from 3DCenter.org and Twitter insiders Kopite7kimi and 暴龙兽55, the article forecasts Nvidia's Lovelace RTX 4000 series and AMD's RDNA 3 Navi 33/32 GPUs to launch between September 2022 and early 2023, analyzes their expected specifications, pricing dynamics, and potential market impact, and notes Intel's upcoming Arc cards as a wildcard.

AMDGPULovelace
0 likes · 7 min read
What the Next‑Gen Nvidia and AMD GPUs Could Mean for the 2022‑2023 Market
Architects' Tech Alliance
Architects' Tech Alliance
Apr 19, 2022 · Artificial Intelligence

Overview of AI Chip Development, Architectures, and Market Trends in China (2022)

The article provides a comprehensive overview of AI chip technology, describing the dependence on mathematical models and semiconductor integration, classifying chips by architecture (GPU, FPGA, ASIC, SoC, brain‑like), deployment (cloud, edge, terminal), and outlining current challenges, market trends, and future research directions such as in‑memory and neuromorphic computing.

AI ChipASICFPGA
0 likes · 11 min read
Overview of AI Chip Development, Architectures, and Market Trends in China (2022)
IT Services Circle
IT Services Circle
Apr 8, 2022 · Fundamentals

The Rise of Domestic GPUs in China: IP Licensing, Imagination Technologies, and Market Dynamics

Chinese domestic GPU development has accelerated rapidly, driven by fast‑track product launches, strategic IP licensing from firms like Imagination Technologies, and supportive policies, while industry players navigate challenges of patents, design complexity, and market competition to bring full‑function GPUs to market.

ChinaChip DesignGPU
0 likes · 12 min read
The Rise of Domestic GPUs in China: IP Licensing, Imagination Technologies, and Market Dynamics
DataFunSummit
DataFunSummit
Apr 7, 2022 · Artificial Intelligence

Optimizing Distributed Machine Learning Training on Google Cloud Vertex AI: Fast Socket and Reduction Server

This article explains how Google Cloud Vertex AI improves large‑scale distributed machine learning training performance by addressing the memory‑wall challenge with Fast Socket network stack enhancements for NCCL and a Reduction Server that accelerates gradient aggregation, delivering higher throughput and lower TCO for AI workloads.

Cloud AIDistributed TrainingFast Socket
0 likes · 19 min read
Optimizing Distributed Machine Learning Training on Google Cloud Vertex AI: Fast Socket and Reduction Server
Python Programming Learning Circle
Python Programming Learning Circle
Mar 31, 2022 · Artificial Intelligence

Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices

This article provides a thorough collection of commonly used PyTorch code snippets covering environment setup, reproducibility, GPU configuration, tensor manipulation, model building, data preprocessing, training and evaluation loops, custom loss functions, regularization techniques, learning‑rate scheduling, checkpointing, and practical tips for efficient deep‑learning development.

Deep LearningGPUModel Training
0 likes · 37 min read
Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices