Tagged articles
536 articles
Page 5 of 6
IT Services Circle
IT Services Circle
Mar 24, 2022 · Artificial Intelligence

NVIDIA Unveils H100 GPU with Hopper Architecture: Massive Performance Gains for AI

At the recent GTC event, NVIDIA introduced the H100 GPU built on the Hopper architecture using TSMC 4nm process, featuring 800 billion transistors, 16,896 CUDA cores, up to 700 W power, 3 TB/s memory bandwidth, and a specialized Transformer engine that accelerates large‑model training up to six times faster, alongside the Grace CPU Superchip and new AI supercomputing systems.

AIGPUGrace CPU
0 likes · 11 min read
NVIDIA Unveils H100 GPU with Hopper Architecture: Massive Performance Gains for AI
JD Retail Technology
JD Retail Technology
Mar 24, 2022 · Mobile Development

Understanding Offscreen Rendering and Its Performance Impact in iOS

Offscreen rendering, a technique where the GPU or CPU renders content to a separate buffer outside the current screen framebuffer, can cause performance overhead due to buffer creation and context switches, and the article explains its principles, common trigger scenarios, and strategies to avoid it in iOS development.

CoreAnimationGPUiOS
0 likes · 11 min read
Understanding Offscreen Rendering and Its Performance Impact in iOS
IT Architects Alliance
IT Architects Alliance
Mar 10, 2022 · Industry Insights

What Drives the AI Chip Market? Types, Trends, and Future Outlook

The article provides a comprehensive overview of AI chips, explaining their broad and narrow definitions, core architectures such as GPU, FPGA, and ASIC, deployment scenarios from cloud to edge, training versus inference roles, current market dynamics, major vendors, and emerging application domains like autonomous driving and smart security.

AI chipsASICEdge Computing
0 likes · 9 min read
What Drives the AI Chip Market? Types, Trends, and Future Outlook
Architects' Tech Alliance
Architects' Tech Alliance
Mar 6, 2022 · Artificial Intelligence

Overview of AI Chip Technologies and Market Trends in China

The article provides a comprehensive overview of AI chips—including GPUs, FPGAs, and ASICs—their architectural distinctions, cloud and edge deployment models, market dynamics in China, and key application scenarios such as autonomous driving, smart security, and IoT devices.

AI chipsASICChina
0 likes · 7 min read
Overview of AI Chip Technologies and Market Trends in China
Meituan Technology Team
Meituan Technology Team
Mar 3, 2022 · Artificial Intelligence

GPU Optimization Practices for Meituan Delivery Search and Recommendation Model Inference

Meituan’s delivery search and recommendation service migrated from separate CPU‑only models to a unified multi‑task model running on a heterogeneous CPU‑GPU architecture, applying system‑level placement, All‑On‑GPU lookup, FP16 mixed precision, operator fusion, TensorRT and TVM compilation, which together delivered roughly a four‑fold increase in inference throughput while maintaining cost.

GPUTVMTensorFlow
0 likes · 24 min read
GPU Optimization Practices for Meituan Delivery Search and Recommendation Model Inference
Architects' Tech Alliance
Architects' Tech Alliance
Feb 16, 2022 · Fundamentals

Key Technology Trends Shaping High‑Performance Computing (HPC)

The article outlines major trends influencing high‑performance computing, including AI integration, GPU/TPU advancements, flexibility in processor architectures, HPC‑as‑a‑Service, hybrid cloud solutions, democratization, the emergence of exascale systems, and micro‑architectural improvements, while providing links to related reports and resources.

ExascaleGPUHPC
0 likes · 9 min read
Key Technology Trends Shaping High‑Performance Computing (HPC)
DataFunTalk
DataFunTalk
Jan 25, 2022 · Cloud Native

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

This article analyzes the complexities of deploying machine‑learning models in production, outlines the limitations of the existing ABox architecture, and details a comprehensive cloud‑native redesign using Seldon on Kubernetes—including custom HDFS initializers, GPU management, logging, and resource monitoring—to streamline operations and enable unified CPU/GPU model serving.

Cloud NativeGPUKubernetes
0 likes · 12 min read
Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution
Youzan Coder
Youzan Coder
Jan 17, 2022 · Artificial Intelligence

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

The team replaced the cumbersome ABox deployment stack with Seldon‑based cloud‑native serving on Kubernetes, unifying TensorFlow and other framework models, adding GPU sharing, automated CRUD, per‑model ingress, monitoring, and log collection, achieving scalable, fault‑tolerant, zero‑downtime model deployment.

AI servingCloud NativeGPU
0 likes · 11 min read
Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution
Alimama Tech
Alimama Tech
Dec 22, 2021 · Artificial Intelligence

Performance Optimization of Advertising Deep Learning Systems: Algorithm, System, and Hardware Co‑Design

The paper presents a holistic algorithm‑system‑hardware co‑design for advertising deep‑learning inference, combining model pruning, approximate computing, kernel fusion, scheduling and PCIe transfer optimizations with GPU and NPU upgrades, achieving up to five‑fold speed‑up and significantly higher latency‑bounded QPS for large‑scale ad services.

Algorithmic OptimizationGPUNPU
0 likes · 24 min read
Performance Optimization of Advertising Deep Learning Systems: Algorithm, System, and Hardware Co‑Design
58 Tech
58 Tech
Dec 21, 2021 · Artificial Intelligence

dl_inference: Open‑Source Deep Learning Inference Service with TensorRT and MKL Acceleration

dl_inference is an open‑source, production‑grade deep learning inference platform that supports TensorFlow, PyTorch and Caffe models, offering GPU and CPU deployment, TensorRT and MKL acceleration, multi‑node load balancing, and extensive Q&A on model conversion, hardware requirements, INT8 quantization, and performance gains.

CPUGPUInference
0 likes · 8 min read
dl_inference: Open‑Source Deep Learning Inference Service with TensorRT and MKL Acceleration
Code DAO
Code DAO
Dec 17, 2021 · Artificial Intelligence

How to Scale XGBoost with Ray for Distributed Multi‑GPU Training

XGBoost‑Ray provides a fault‑tolerant, multi‑node, multi‑GPU backend for XGBoost that integrates seamlessly with Ray Tune, supports distributed data loading, and can be enabled with only three code changes, enabling scalable training and inference on large clusters.

Distributed TrainingGPURay
0 likes · 8 min read
How to Scale XGBoost with Ray for Distributed Multi‑GPU Training
Architects' Tech Alliance
Architects' Tech Alliance
Dec 11, 2021 · Fundamentals

2021 China Integrated Circuit Market Research Report Overview

The 2021 China Integrated Circuit Market Research Report analyzes recent three‑year trends showing rising shares of MPU and logic chips, declining DRAM, stable analog and MCU, and details the market status, growth forecasts, and challenges for CPU, GPU, FPGA, ASIC, and storage technologies.

AI chipsCPUFPGA
0 likes · 11 min read
2021 China Integrated Circuit Market Research Report Overview
Architects' Tech Alliance
Architects' Tech Alliance
Nov 16, 2021 · Fundamentals

2021 China Integrated Circuit Market Research Report Overview

The 2021 China Integrated Circuit Market Research Report analyzes recent three‑year trends, showing rising shares for MPU and logic chips, declining DRAM, stable analog and MCU, while detailing the market positions, growth rates, and challenges of CPU, GPU, FPGA, ASIC, and flash storage technologies.

ASICCPUChina
0 likes · 11 min read
2021 China Integrated Circuit Market Research Report Overview
Architects' Tech Alliance
Architects' Tech Alliance
Nov 5, 2021 · Artificial Intelligence

GPU Architecture in the AI Era: From Specific‑Domain Designs to 3D/AI Fusion

The article analyzes how GPU architecture, originally designed for 3D graphics, is being reshaped by AI demands through specific‑domain designs, hardware/software interfaces, tensor acceleration, and 3D/AI convergence, ultimately arguing that GPUs will remain the central compute platform in the new golden age of computer architecture.

3D renderingGPUTensor Acceleration
0 likes · 14 min read
GPU Architecture in the AI Era: From Specific‑Domain Designs to 3D/AI Fusion
Architects' Tech Alliance
Architects' Tech Alliance
Oct 28, 2021 · Artificial Intelligence

GPU Technology Overview: Architecture, Market Landscape, and Key Application Directions

This article provides a comprehensive overview of GPU technology, covering its multi‑core architecture, market oligopoly among Intel, NVIDIA and AMD, classifications of integrated and independent GPUs, and the three major application trends of gaming performance, artificial intelligence/deep learning, and autonomous driving.

GPUGamingHardware
0 likes · 14 min read
GPU Technology Overview: Architecture, Market Landscape, and Key Application Directions
Kuaishou Tech
Kuaishou Tech
Oct 25, 2021 · Fundamentals

Noise Techniques for Short Video Effects and Their Generation Algorithms

This article explores how various noise algorithms—including value, gradient, simplex, cellular, and FBM—are applied to short video visual effects, compares random number generators for GPU rendering, and provides GLSL code examples to illustrate implementation and performance trade‑offs.

GPUGraphicsnoise
0 likes · 17 min read
Noise Techniques for Short Video Effects and Their Generation Algorithms
Kuaishou Large Model
Kuaishou Large Model
Oct 22, 2021 · Fundamentals

How Noise Powers Real‑Time Short‑Video Effects: Algorithms, Samples & GPU RNG Comparison

This article explains how various noise algorithms—value, gradient, simplex, cellular, and FBM—are applied to short‑video visual effects, showcases shader implementations and image examples, and compares GPU random‑number generators to help developers choose the right balance of performance and visual quality.

GPURandom Number GenerationShader
0 likes · 17 min read
How Noise Powers Real‑Time Short‑Video Effects: Algorithms, Samples & GPU RNG Comparison
Xianyu Technology
Xianyu Technology
Oct 21, 2021 · Mobile Development

Flutter iOS GPU Background Crash Analysis and Solution

The article analyzes why Flutter crashes on iOS when accessing the GPU in the background, explains the official SyncSwitch fix for ImageDecoder, and details Xianyu’s additional patches for MultipleFrameCodec, EncodeImage, and Rasterizer::DrawToSurface that together, via PR #28383, fully resolve the GPU‑background crash.

CrashFlutterGPU
0 likes · 11 min read
Flutter iOS GPU Background Crash Analysis and Solution
21CTO
21CTO
Oct 2, 2021 · Artificial Intelligence

How PyTorch Lightning Can Make Your Deep Learning Pipeline 10× Faster

This article explains six practical techniques—parallel data loading, distributed multi‑GPU training, mixed precision, early stopping, sharded training, and inference optimizations—using PyTorch Lightning to dramatically accelerate deep‑learning pipelines, turning days‑long experiments into minute‑scale runs.

Deep LearningGPUPyTorch Lightning
0 likes · 7 min read
How PyTorch Lightning Can Make Your Deep Learning Pipeline 10× Faster
Meituan Technology Team
Meituan Technology Team
Sep 9, 2021 · Artificial Intelligence

GPU Optimization Practices for CTR Models at Meituan

Meituan accelerates CTR model inference by fusing operators with TVM, optimizing CPU‑GPU data transfers, manually tuning high‑frequency subgraphs, and dynamically offloading workloads, achieving up to ten‑fold throughput gains on Tesla T4 GPUs while keeping latency stable and only modestly increasing beyond 128 QPS, though compilation remains slow and large‑model support needs improvement.

CTRDeep LearningGPU
0 likes · 16 min read
GPU Optimization Practices for CTR Models at Meituan
Liangxu Linux
Liangxu Linux
Aug 17, 2021 · Cloud Native

How to Enable GPU Acceleration in Docker on Linux

This guide walks you through installing NVIDIA drivers, CUDA, and nvidia-docker2 on a Linux host, configuring Docker to access the GPU, and verifying the setup with commands and sample TensorFlow/PyTorch code, enabling deep‑learning workloads inside containers.

CUDADeep LearningDocker
0 likes · 7 min read
How to Enable GPU Acceleration in Docker on Linux
DataFunSummit
DataFunSummit
Aug 16, 2021 · Artificial Intelligence

Scaling Deep Learning Models: From Depth to Width and Parallelism Strategies

The article reviews how deep learning models have grown deeper and wider, discusses the memory and bandwidth limits of single GPUs, and explains pipeline and sharding techniques—including GPU clusters and TPU pods—to efficiently train large‑scale models in industrial settings.

GPUMixture of ExpertsModel Parallelism
0 likes · 6 min read
Scaling Deep Learning Models: From Depth to Width and Parallelism Strategies
Architects' Tech Alliance
Architects' Tech Alliance
Jul 16, 2021 · Artificial Intelligence

AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning

The article explains how artificial intelligence relies on algorithms, compute and data, compares engineering and simulation methods, and details the roles, architectures, performance and energy characteristics of GPUs, FPGAs, and ASICs as the primary hardware accelerators for modern deep‑learning applications.

ASICChip DesignDeep Learning
0 likes · 14 min read
AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning
Architects' Tech Alliance
Architects' Tech Alliance
Apr 26, 2021 · Artificial Intelligence

GPU Market Overview and Industry Applications

The article provides a comprehensive overview of GPU technology, its architecture, rapid market growth, segmentation by type, device and industry, cloud deployment trends, competitive landscape, and diverse applications ranging from high‑performance computing and AI to automotive, AR/VR, and IoT.

GPUHigh‑Performance ComputingMarket analysis
0 likes · 9 min read
GPU Market Overview and Industry Applications
JD Cloud Developers
JD Cloud Developers
Apr 26, 2021 · Artificial Intelligence

Top Tech Highlights: Open‑Source Mars Drone, AI‑Powered GPUs, Cloud Growth & More

This week’s developer newsletter spotlights NASA’s open‑source‑based Ingenuity helicopter soaring on Mars, JD’s ESG report and green cloud initiatives, NVIDIA’s record‑breaking AI inference GPUs, rapid growth of China’s public‑cloud market, Tsinghua’s new chip academy, Hugging Face’s Accelerate library for multi‑GPU training, plus cutting‑edge research on GAN IP protection and hierarchical task learning presented at CVPR and ICLR.

AIGPUopen source
0 likes · 5 min read
Top Tech Highlights: Open‑Source Mars Drone, AI‑Powered GPUs, Cloud Growth & More
Architects' Tech Alliance
Architects' Tech Alliance
Mar 15, 2021 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

This article provides a comprehensive overview of NVIDIA's GPU architecture evolution—covering Fermi, Kepler, Maxwell, Pascal, Volta, Turing, and Ampere—detailing compute capabilities, SM structures, specialized units such as Tensor Cores, and their impact on AI and high‑performance computing workloads.

AICUDAGPU
0 likes · 19 min read
Evolution of NVIDIA GPU Architectures from Fermi to Ampere
Architects' Tech Alliance
Architects' Tech Alliance
Mar 13, 2021 · Artificial Intelligence

Industry Deep Report: GPU Research Framework

This report analyzes the evolution of processor chips, the rise of heterogeneous computing, and provides a comprehensive GPU investment logic framework, detailing GPU architecture, market competition, global industry landscape, and the challenges and prospects of domestic GPU development in China.

AIChipGPU
0 likes · 5 min read
Industry Deep Report: GPU Research Framework
MaGe Linux Operations
MaGe Linux Operations
Mar 11, 2021 · Artificial Intelligence

What’s New in PyTorch 1.8? Key Features, APIs, and Performance Boosts

PyTorch 1.8, released by the PyTorch team, bundles over 3,000 commits since 1.7, introducing AMD ROCm support, enhanced Python function conversion, stable FFT and linear‑algebra APIs, complex‑tensor autograd, distributed‑training improvements, new mobile tutorials, performance tools, and several prototype features.

Deep LearningGPUMobile
0 likes · 6 min read
What’s New in PyTorch 1.8? Key Features, APIs, and Performance Boosts
Architects' Tech Alliance
Architects' Tech Alliance
Mar 7, 2021 · Fundamentals

Understanding the Linux Graphics Stack from a GPU Perspective

This article explains the role of GPUs in computing, traces the evolution of graphics standards and GPU architectures, and details the development of the Linux graphics stack from early X11 to modern Wayland, providing a comprehensive overview for developers and hardware enthusiasts.

GPUGraphics StackOpen standards
0 likes · 3 min read
Understanding the Linux Graphics Stack from a GPU Perspective
ITPUB
ITPUB
Mar 7, 2021 · Blockchain

Can You Mine Ethereum on an Apple M1 Mac? A Hands‑On Test and Results

This article documents a developer’s attempt to run Ethereum mining software on an M1‑based MacBook Air, detailing the required patches, compilation steps, observed hash rates, daily earnings, and how the performance compares with traditional GPU miners.

BlockchainEthereumGPU
0 likes · 9 min read
Can You Mine Ethereum on an Apple M1 Mac? A Hands‑On Test and Results
360 Tech Engineering
360 Tech Engineering
Mar 1, 2021 · Artificial Intelligence

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

This article details the engineering challenges of serving a large BERT model in real‑time for 360 Search and describes a series of optimizations—including TensorRT‑based kernel fusion, model quantization, knowledge distillation, multi‑stream execution, caching, and dynamic sequence handling—that together achieve low latency, high throughput, and stable deployment on GPU clusters.

BERTGPUModel Optimization
0 likes · 10 min read
Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search
Architects' Tech Alliance
Architects' Tech Alliance
Jan 5, 2021 · Operations

Understanding Data Centers: Architecture, Technologies, and Operational Considerations

This article explains what data centers are, outlines their core components—compute, storage, and networking—covers architectural decisions, industry standards, and emerging technologies such as edge computing, micro‑data centers, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration, highlighting their impact on modern enterprise operations.

Edge ComputingGPUHCI
0 likes · 11 min read
Understanding Data Centers: Architecture, Technologies, and Operational Considerations
Architects' Tech Alliance
Architects' Tech Alliance
Dec 30, 2020 · Artificial Intelligence

Understanding GPUs, AI Accelerators, and Market Trends

The article explains GPU evolution, its integration with CPUs, interconnect technologies like PCIe and NVLink, market shares of NVIDIA, AMD and Intel, AI accelerator types (GPU, FPGA, ASIC), and the roles of training and inference in cloud AI, while also promoting a paid 182‑page PPT resource.

AI acceleratorGPUHPC
0 likes · 7 min read
Understanding GPUs, AI Accelerators, and Market Trends
21CTO
21CTO
Dec 22, 2020 · Artificial Intelligence

Explore tinygrad: A Minimalist Deep Learning Framework Under 1000 Lines

tinygrad, an open‑source autograd tensor library by George Hotz, offers a compact PyTorch‑like experience in fewer than 1000 lines, with easy installation, GPU support via PyOpenCL, full EfficientNet inference, and extensible optimizers for rapid neural‑network prototyping.

AIAutogradDeep Learning
0 likes · 6 min read
Explore tinygrad: A Minimalist Deep Learning Framework Under 1000 Lines
Programmer DD
Programmer DD
Dec 17, 2020 · Artificial Intelligence

Can Huang’s Law Double AI Performance Every Two Years? NVIDIA GTC 2020 Insights

At NVIDIA’s GTC China 2020, chief scientist Bill Dally highlighted the “Huang’s Law” predicting GPU-driven AI performance to double biennially, introduced projects like MAGNet, optical interconnects, and the Legate programming model, and discussed the broader implications for AI ecosystem development and industry adoption.

AI PerformanceGPUHuang's Law
0 likes · 8 min read
Can Huang’s Law Double AI Performance Every Two Years? NVIDIA GTC 2020 Insights
Architects' Tech Alliance
Architects' Tech Alliance
Dec 16, 2020 · Artificial Intelligence

AI Chip Landscape: Architecture, Trends, and Market Players

This article provides a comprehensive overview of the AI chip ecosystem, covering the evolution of GPU, FPGA, ASIC and neuromorphic chips, their performance trade‑offs, key industry players, and the rapid growth of China’s domestic chip manufacturers in the context of deep‑learning demands.

AI chipsASICFPGA
0 likes · 11 min read
AI Chip Landscape: Architecture, Trends, and Market Players
DataFunSummit
DataFunSummit
Dec 14, 2020 · Artificial Intelligence

LightSeq: High‑Performance Open‑Source Inference Engine for Transformers, GPT and Other NLP Models

This article introduces LightSeq, an open‑source, GPU‑accelerated inference engine that dramatically speeds up Transformer‑based models such as BERT and GPT by up to 14× over TensorFlow, supports multiple decoding strategies, integrates seamlessly with major deep‑learning frameworks, and provides detailed performance benchmarks and technical optimizations.

Deep LearningGPUInference
0 likes · 15 min read
LightSeq: High‑Performance Open‑Source Inference Engine for Transformers, GPT and Other NLP Models
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2020 · Operations

Understanding Data Centers: Architecture, Reliability, and Emerging Technologies

This article explains what a data center is, its core components of compute, storage, and networking, the operational and architectural considerations for reliability and security, and reviews industry standards and emerging technologies such as edge computing, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration.

Edge ComputingGPUInfrastructure
0 likes · 12 min read
Understanding Data Centers: Architecture, Reliability, and Emerging Technologies
Programmer DD
Programmer DD
Dec 6, 2020 · Cloud Native

Enable GPU Support in Kubernetes with Containerd and NVIDIA Runtime

This guide walks through installing NVIDIA drivers, CUDA toolkit, nvidia-container-runtime, configuring Containerd, deploying the NVIDIA device plugin, and testing GPU access inside Kubernetes pods, providing a complete solution for GPU workloads on containerd‑based clusters.

CUDADevice PluginsGPU
0 likes · 11 min read
Enable GPU Support in Kubernetes with Containerd and NVIDIA Runtime
Efficient Ops
Efficient Ops
Sep 3, 2020 · Operations

What Recent Cloud and Data Center Incidents Reveal About Industry Risks?

A roundup of recent tech news covering a Cisco sabotage case, a London data‑center fire, Linux's 29th anniversary, Gartner's China ICT trends, major cloud investments, Windows 95 milestones, Didi's GPU server launch, Hainan's DNS project, Dell’Oro's market report, executive share reductions, and an upcoming global operations conference.

Data centerGPUOperations
0 likes · 10 min read
What Recent Cloud and Data Center Incidents Reveal About Industry Risks?
Tencent Tech
Tencent Tech
Aug 26, 2020 · Artificial Intelligence

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.

GPUImageNetTencent Cloud
0 likes · 9 min read
How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 18, 2020 · Artificial Intelligence

How to Build a GPU‑Accelerated Distributed ML Platform for VM Migration Prediction

This article explains how to design and implement a GPU‑accelerated, distributed machine‑learning system on Alibaba Cloud to predict virtual‑machine workload and hot‑migration downtime, covering architecture, components, message‑queue design, data handling, GPU acceleration, and model deployment.

CloudComputingDistributedMLGPU
0 likes · 13 min read
How to Build a GPU‑Accelerated Distributed ML Platform for VM Migration Prediction
TAL Education Technology
TAL Education Technology
May 14, 2020 · Artificial Intelligence

An Introduction to GPU Computing and CUDA Architecture

This article provides a concise overview of GPU computing fundamentals, covering GPU hardware components, memory hierarchy, parallel execution models, and the CUDA programming framework, illustrating how CPUs and GPUs cooperate in heterogeneous computing environments.

CUDACUDA programmingGPU
0 likes · 16 min read
An Introduction to GPU Computing and CUDA Architecture
Architects' Tech Alliance
Architects' Tech Alliance
May 10, 2020 · Fundamentals

Server CPU, GPU, and Memory Basics

This article introduces the essential components of a server—CPU, GPU, and memory—explaining their roles, characteristics, and common configurations, especially for video processing and artificial intelligence workloads, while providing visual diagrams and further reading suggestions.

CPUGPUHardware
0 likes · 4 min read
Server CPU, GPU, and Memory Basics
Architects' Tech Alliance
Architects' Tech Alliance
Feb 6, 2020 · Fundamentals

How Computer Memory Evolved: From SDRAM to DDR4 and Modern GPU Memory

This article explains the historical shift from early north‑bridge memory buses to integrated CPU memory controllers, details the progression of SDRAM to DDR4—including voltage, prefetch and feature changes—covers future trends in capacity, voltage and frequency, and compares system memory bandwidth with GPU memory technologies such as GDDR5 and HBM.

DDRGPUHardware
0 likes · 11 min read
How Computer Memory Evolved: From SDRAM to DDR4 and Modern GPU Memory
Alibaba Cloud Native
Alibaba Cloud Native
Jan 13, 2020 · Cloud Native

How to Manage GPU Resources in Kubernetes: From Containers to Device Plugins

This article explains why managing GPUs with Kubernetes improves cost efficiency and deployment speed, details how to containerize GPU workloads, build appropriate images, configure NVIDIA drivers, and use Kubernetes Device Plugins and Extend Resources to schedule and monitor GPU resources, while also discussing current limitations and community solutions.

Device PluginGPUKubernetes
0 likes · 18 min read
How to Manage GPU Resources in Kubernetes: From Containers to Device Plugins
Architects' Tech Alliance
Architects' Tech Alliance
Dec 28, 2019 · Artificial Intelligence

Understanding CPU vs GPU, GPU Parameters, and NVIDIA Architectures for AI and High‑Performance Computing

The article explains how CPUs and GPUs differ in architecture and workload handling, details key GPU specifications such as CUDA cores, memory bandwidth and floating‑point precision, reviews NVIDIA's product families and architectural evolution, and highlights the role of GPUs in deep learning training and inference while also mentioning a related technical ebook promotion.

AICPUCUDA
0 likes · 13 min read
Understanding CPU vs GPU, GPU Parameters, and NVIDIA Architectures for AI and High‑Performance Computing
Architects' Tech Alliance
Architects' Tech Alliance
Dec 21, 2019 · Fundamentals

GPU Overview, Usage Methods, and Virtualization Technologies

This article explains the definition and history of GPUs, why dedicated graphics processors are needed, how they are accessed through graphics libraries and vendor APIs such as OpenGL, DirectX, CUDA and OpenCL, and describes various GPU virtualization techniques including virtual graphics cards, passthrough, and vCUDA with their client‑server‑manager architecture.

CUDAComputeGPU
0 likes · 20 min read
GPU Overview, Usage Methods, and Virtualization Technologies
360 Quality & Efficiency
360 Quality & Efficiency
Dec 6, 2019 · Artificial Intelligence

Accelerating OpenCV Image Matching with GPU (CUDA) in Python

This article demonstrates how compiling OpenCV 3.2 with CUDA 8.0 enables GPU‑accelerated template matching in Python, reducing average processing time from 0.299 seconds on CPU to 0.181 seconds on GPU—a 39.4% performance gain for automated testing image‑recognition APIs.

CUDAGPUOpenCV
0 likes · 3 min read
Accelerating OpenCV Image Matching with GPU (CUDA) in Python
360 Quality & Efficiency
360 Quality & Efficiency
Dec 6, 2019 · Artificial Intelligence

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

AIDockerGPU
0 likes · 9 min read
Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison
Snowball Engineer Team
Snowball Engineer Team
Oct 17, 2019 · Artificial Intelligence

GPU-Accelerated Model Training Optimizations for Snowball Feed Recommendation System

This article describes the challenges of large‑scale model training for Snowball’s feed recommendation, and details a series of engineering optimizations—including GPU acceleration, multi‑threaded data preparation, TFRecord conversion, compression, and batch‑map reordering—that increased training throughput from 6 k to over 20 k samples per second while reducing CPU and I/O bottlenecks.

GPUModel TrainingTFRecord
0 likes · 15 min read
GPU-Accelerated Model Training Optimizations for Snowball Feed Recommendation System
Architects' Tech Alliance
Architects' Tech Alliance
Oct 14, 2019 · Industry Insights

From ECU CPUs to ASICs: The Evolution of Automotive Chips for Autonomous Driving

This article traces the development of automotive electronic control units from early CPU‑centric ECUs to centralized domain controllers, examines the rise of GPU‑based AI accelerators for assisted driving, and explains why ASICs are expected to dominate future autonomous‑driving chips, while profiling key industry players and their strategies.

AI AcceleratorsASICFPGA
0 likes · 21 min read
From ECU CPUs to ASICs: The Evolution of Automotive Chips for Autonomous Driving
Architects' Tech Alliance
Architects' Tech Alliance
Sep 20, 2019 · Industry Insights

Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing

The article explains how heterogeneous parallel computing—distributing tasks across CPUs, GPUs, FPGAs and other accelerators—has become essential after Moore’s law plateau, detailing its principles, hardware and software perspectives, classification of architectures, processing stages, user‑guided versus compiler‑guided methods, and its relevance to AI, cloud and industry workloads.

CPUFPGAGPU
0 likes · 15 min read
Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing
Tencent Cloud Developer
Tencent Cloud Developer
Sep 20, 2019 · Artificial Intelligence

Architecture of Tencent Cloud AI Platform (YunZhiTianshu) and AI Practices on Kubernetes

The article details Tencent Cloud’s YunZhiTianshu AI platform architecture—spanning Docker/Kubernetes infrastructure, storage, six micro‑service layers and API/message gateways—while explaining core module designs, unified algorithm packaging, device and data abstraction, and practical Kubernetes deployment techniques for GPU‑accelerated AI workloads, monitoring, scaling, and security.

AI PlatformGPUKubernetes
0 likes · 15 min read
Architecture of Tencent Cloud AI Platform (YunZhiTianshu) and AI Practices on Kubernetes
Architects' Tech Alliance
Architects' Tech Alliance
Sep 6, 2019 · Fundamentals

Understanding the Differences Between CPU and GPU Architectures

CPU and GPU serve distinct roles in computing: the CPU, as a versatile general‑purpose processor, handles complex logic and varied data types, while the GPU, built with many simple cores and long pipelines, excels at parallel processing of uniform, large‑scale data such as graphics and AI workloads.

AICPUGPU
0 likes · 10 min read
Understanding the Differences Between CPU and GPU Architectures
Architects' Tech Alliance
Architects' Tech Alliance
Sep 2, 2019 · Databases

The Relationship Between Databases and Emerging Hardware Technologies

This article examines how recent hardware advances such as multi‑core processors, large memory, SSDs, NVM, GPUs and FPGAs have reshaped database system design, outlines the stages from pure academic research to productization, and surveys current database products and research directions leveraging these new devices.

FPGAGPUNVM
0 likes · 11 min read
The Relationship Between Databases and Emerging Hardware Technologies
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 17, 2019 · Artificial Intelligence

How Alibaba Halved BERT Latency for Real‑Time Search

This article details Alibaba's technical challenges with BERT's high resource consumption in online search, analyzes memory and compute bottlenecks using TensorFlow profiling, and presents both TensorFlow‑based tweaks and a custom CUDA implementation that together double throughput and cut latency by about 50%.

AlibabaBERTGPU
0 likes · 9 min read
How Alibaba Halved BERT Latency for Real‑Time Search
360 Tech Engineering
360 Tech Engineering
May 10, 2019 · Artificial Intelligence

Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow

This article explains how MXNet supports data‑parallel training on single‑machine multi‑GPU and multi‑machine multi‑GPU setups, describes KVStore modes, outlines the worker‑server‑scheduler architecture, and shows how to launch large‑scale distributed training using Kubeflow and the mxnet‑operator.

Data ParallelDeep LearningDistributed Training
0 likes · 11 min read
Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
May 9, 2019 · Artificial Intelligence

Master Distributed MXNet Training with Kubeflow: A Step‑by‑Step Guide

Learn how to perform single‑machine multi‑GPU and multi‑node multi‑GPU training with MXNet, understand KVStore modes, configure workers, servers, and schedulers, and deploy large‑scale distributed training on Kubernetes using Kubeflow, including operator installation, task creation, and performance considerations.

Distributed TrainingGPUKubeflow
0 likes · 11 min read
Master Distributed MXNet Training with Kubeflow: A Step‑by‑Step Guide
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2019 · Fundamentals

Differences Between CPU and GPU Architectures and the Relationship Between OpenCL and CUDA

This article explains the fundamental architectural differences between CPUs and GPUs, their design goals and performance characteristics, and compares OpenCL and CUDA, highlighting OpenCL’s cross‑platform flexibility versus CUDA’s NVIDIA‑specific optimization, while illustrating how each fits various parallel computing tasks.

CPUCUDAGPU
0 likes · 7 min read
Differences Between CPU and GPU Architectures and the Relationship Between OpenCL and CUDA