Tagged articles

536 articles

Page 5 of 6

Mar 25, 2022 · Fundamentals

Understanding High‑Performance Computing (HPC): Principles, Architecture, and Terminology

This article explains the fundamentals of high‑performance computing (HPC), covering its serial and parallel processing models, CPU and GPU roles, heterogeneous architectures, FLOPS performance metrics, market trends, and key terminology needed to grasp why HPC is essential for scientific and engineering simulations.

FLOPSGPUHPC

0 likes · 6 min read

Understanding High‑Performance Computing (HPC): Principles, Architecture, and Terminology

IT Services Circle

Mar 24, 2022 · Artificial Intelligence

NVIDIA Unveils H100 GPU with Hopper Architecture: Massive Performance Gains for AI

At the recent GTC event, NVIDIA introduced the H100 GPU built on the Hopper architecture using TSMC 4nm process, featuring 800 billion transistors, 16,896 CUDA cores, up to 700 W power, 3 TB/s memory bandwidth, and a specialized Transformer engine that accelerates large‑model training up to six times faster, alongside the Grace CPU Superchip and new AI supercomputing systems.

AIGPUGrace CPU

0 likes · 11 min read

NVIDIA Unveils H100 GPU with Hopper Architecture: Massive Performance Gains for AI

JD Retail Technology

Mar 24, 2022 · Mobile Development

Understanding Offscreen Rendering and Its Performance Impact in iOS

Offscreen rendering, a technique where the GPU or CPU renders content to a separate buffer outside the current screen framebuffer, can cause performance overhead due to buffer creation and context switches, and the article explains its principles, common trigger scenarios, and strategies to avoid it in iOS development.

CoreAnimationGPUiOS

0 likes · 11 min read

Understanding Offscreen Rendering and Its Performance Impact in iOS

Architects' Tech Alliance

Mar 21, 2022 · Industry Insights

What Powers Supercomputers? A Deep Dive into High‑Performance Computing

This article explains the fundamentals of high‑performance computing (HPC), covering serial and parallel processing, CPU vs GPU roles, heterogeneous architectures, FLOPS performance metrics, system design challenges, and why HPC is essential for large‑scale scientific and engineering simulations.

FLOPSGPUHPC

0 likes · 6 min read

What Powers Supercomputers? A Deep Dive into High‑Performance Computing

IT Architects Alliance

Mar 10, 2022 · Industry Insights

What Drives the AI Chip Market? Types, Trends, and Future Outlook

The article provides a comprehensive overview of AI chips, explaining their broad and narrow definitions, core architectures such as GPU, FPGA, and ASIC, deployment scenarios from cloud to edge, training versus inference roles, current market dynamics, major vendors, and emerging application domains like autonomous driving and smart security.

AI chipsASICEdge Computing

0 likes · 9 min read

What Drives the AI Chip Market? Types, Trends, and Future Outlook

Architects' Tech Alliance

Mar 6, 2022 · Artificial Intelligence

Overview of AI Chip Technologies and Market Trends in China

The article provides a comprehensive overview of AI chips—including GPUs, FPGAs, and ASICs—their architectural distinctions, cloud and edge deployment models, market dynamics in China, and key application scenarios such as autonomous driving, smart security, and IoT devices.

AI chipsASICChina

0 likes · 7 min read

Overview of AI Chip Technologies and Market Trends in China

Meituan Technology Team

Mar 3, 2022 · Artificial Intelligence

GPU Optimization Practices for Meituan Delivery Search and Recommendation Model Inference

Meituan’s delivery search and recommendation service migrated from separate CPU‑only models to a unified multi‑task model running on a heterogeneous CPU‑GPU architecture, applying system‑level placement, All‑On‑GPU lookup, FP16 mixed precision, operator fusion, TensorRT and TVM compilation, which together delivered roughly a four‑fold increase in inference throughput while maintaining cost.

GPUTVMTensorFlow

0 likes · 24 min read

GPU Optimization Practices for Meituan Delivery Search and Recommendation Model Inference

Architects' Tech Alliance

Feb 16, 2022 · Fundamentals

Key Technology Trends Shaping High‑Performance Computing (HPC)

The article outlines major trends influencing high‑performance computing, including AI integration, GPU/TPU advancements, flexibility in processor architectures, HPC‑as‑a‑Service, hybrid cloud solutions, democratization, the emergence of exascale systems, and micro‑architectural improvements, while providing links to related reports and resources.

ExascaleGPUHPC

0 likes · 9 min read

Key Technology Trends Shaping High‑Performance Computing (HPC)

DataFunTalk

Jan 25, 2022 · Cloud Native

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

This article analyzes the complexities of deploying machine‑learning models in production, outlines the limitations of the existing ABox architecture, and details a comprehensive cloud‑native redesign using Seldon on Kubernetes—including custom HDFS initializers, GPU management, logging, and resource monitoring—to streamline operations and enable unified CPU/GPU model serving.

Cloud NativeGPUKubernetes

0 likes · 12 min read

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

Youzan Coder

Jan 17, 2022 · Artificial Intelligence

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

The team replaced the cumbersome ABox deployment stack with Seldon‑based cloud‑native serving on Kubernetes, unifying TensorFlow and other framework models, adding GPU sharing, automated CRUD, per‑model ingress, monitoring, and log collection, achieving scalable, fault‑tolerant, zero‑downtime model deployment.

AI servingCloud NativeGPU

0 likes · 11 min read

Architects' Tech Alliance

Jan 13, 2022 · Fundamentals

Overview of China's Domestic X86, ADC, Automotive, and GPU Chip Landscape

The article provides a comprehensive analysis of China's semiconductor challenges and progress, covering the status of domestic X86 processors, high‑speed ADC technology, automotive chip development, and GPU advancements, while highlighting market dynamics, licensing issues, and future prospects.

ADCGPUautomotive

0 likes · 22 min read

Overview of China's Domestic X86, ADC, Automotive, and GPU Chip Landscape

Alimama Tech

Dec 22, 2021 · Artificial Intelligence

Performance Optimization of Advertising Deep Learning Systems: Algorithm, System, and Hardware Co‑Design

The paper presents a holistic algorithm‑system‑hardware co‑design for advertising deep‑learning inference, combining model pruning, approximate computing, kernel fusion, scheduling and PCIe transfer optimizations with GPU and NPU upgrades, achieving up to five‑fold speed‑up and significantly higher latency‑bounded QPS for large‑scale ad services.

Algorithmic OptimizationGPUNPU

0 likes · 24 min read

Performance Optimization of Advertising Deep Learning Systems: Algorithm, System, and Hardware Co‑Design

58 Tech

Dec 21, 2021 · Artificial Intelligence

dl_inference: Open‑Source Deep Learning Inference Service with TensorRT and MKL Acceleration

dl_inference is an open‑source, production‑grade deep learning inference platform that supports TensorFlow, PyTorch and Caffe models, offering GPU and CPU deployment, TensorRT and MKL acceleration, multi‑node load balancing, and extensive Q&A on model conversion, hardware requirements, INT8 quantization, and performance gains.

CPUGPUInference

0 likes · 8 min read

dl_inference: Open‑Source Deep Learning Inference Service with TensorRT and MKL Acceleration

Python Programming Learning Circle

Dec 20, 2021 · Artificial Intelligence

Monitoring Python CPU and GPU Memory Usage with memory_profiler and Pytorch‑Memory‑Utils

This article introduces the Python libraries memory_profiler and Pytorch‑Memory‑Utils, demonstrates how to measure line‑by‑line CPU memory consumption and GPU memory usage in notebooks and scripts, and explains the additional overhead introduced by PyTorch during model loading.

GPUPythonmemory profiling

0 likes · 7 min read

Monitoring Python CPU and GPU Memory Usage with memory_profiler and Pytorch‑Memory‑Utils

Code DAO

Dec 17, 2021 · Artificial Intelligence

How to Scale XGBoost with Ray for Distributed Multi‑GPU Training

XGBoost‑Ray provides a fault‑tolerant, multi‑node, multi‑GPU backend for XGBoost that integrates seamlessly with Ray Tune, supports distributed data loading, and can be enabled with only three code changes, enabling scalable training and inference on large clusters.

Distributed TrainingGPURay

0 likes · 8 min read

How to Scale XGBoost with Ray for Distributed Multi‑GPU Training

Architects' Tech Alliance

Dec 11, 2021 · Fundamentals

2021 China Integrated Circuit Market Research Report Overview

The 2021 China Integrated Circuit Market Research Report analyzes recent three‑year trends showing rising shares of MPU and logic chips, declining DRAM, stable analog and MCU, and details the market status, growth forecasts, and challenges for CPU, GPU, FPGA, ASIC, and storage technologies.

AI chipsCPUFPGA

0 likes · 11 min read

2021 China Integrated Circuit Market Research Report Overview

Youku Technology

Dec 8, 2021 · Frontend Development

How Youku Achieves 60fps Danmaku Rendering on Mobile: Architecture and Performance Tricks

This article reveals the technical design of Youku's high‑performance danmaku rendering engine, covering its background, industry landscape, GPU‑based rendering pipeline, multi‑layer architecture, performance‑boosting techniques, real‑world benchmarks, and the innovative effects it enables.

GPUMobileOpenGL

0 likes · 17 min read

How Youku Achieves 60fps Danmaku Rendering on Mobile: Architecture and Performance Tricks

Architects' Tech Alliance

Nov 16, 2021 · Fundamentals

2021 China Integrated Circuit Market Research Report Overview

The 2021 China Integrated Circuit Market Research Report analyzes recent three‑year trends, showing rising shares for MPU and logic chips, declining DRAM, stable analog and MCU, while detailing the market positions, growth rates, and challenges of CPU, GPU, FPGA, ASIC, and flash storage technologies.

ASICCPUChina

0 likes · 11 min read

Architects' Tech Alliance

Nov 5, 2021 · Artificial Intelligence

GPU Architecture in the AI Era: From Specific‑Domain Designs to 3D/AI Fusion

The article analyzes how GPU architecture, originally designed for 3D graphics, is being reshaped by AI demands through specific‑domain designs, hardware/software interfaces, tensor acceleration, and 3D/AI convergence, ultimately arguing that GPUs will remain the central compute platform in the new golden age of computer architecture.

3D renderingGPUTensor Acceleration

0 likes · 14 min read

GPU Architecture in the AI Era: From Specific‑Domain Designs to 3D/AI Fusion

Architects' Tech Alliance

Oct 28, 2021 · Artificial Intelligence

GPU Technology Overview: Architecture, Market Landscape, and Key Application Directions

This article provides a comprehensive overview of GPU technology, covering its multi‑core architecture, market oligopoly among Intel, NVIDIA and AMD, classifications of integrated and independent GPUs, and the three major application trends of gaming performance, artificial intelligence/deep learning, and autonomous driving.

GPUGamingHardware

0 likes · 14 min read

GPU Technology Overview: Architecture, Market Landscape, and Key Application Directions

Kuaishou Tech

Oct 25, 2021 · Fundamentals

Noise Techniques for Short Video Effects and Their Generation Algorithms

This article explores how various noise algorithms—including value, gradient, simplex, cellular, and FBM—are applied to short video visual effects, compares random number generators for GPU rendering, and provides GLSL code examples to illustrate implementation and performance trade‑offs.

GPUGraphicsnoise

0 likes · 17 min read

Noise Techniques for Short Video Effects and Their Generation Algorithms

360 Quality & Efficiency

Oct 22, 2021 · Artificial Intelligence

Troubleshooting CUDA Availability for PyTorch: Installation and Version Compatibility Guide

The article walks through diagnosing why PyTorch cannot access the GPU, reinstalling CUDA, selecting matching PyTorch builds, adjusting versions, and verifying that CUDA becomes available for accelerated training.

CUDAGPUInstallation

0 likes · 4 min read

Troubleshooting CUDA Availability for PyTorch: Installation and Version Compatibility Guide

Kuaishou Large Model

Oct 22, 2021 · Fundamentals

How Noise Powers Real‑Time Short‑Video Effects: Algorithms, Samples & GPU RNG Comparison

This article explains how various noise algorithms—value, gradient, simplex, cellular, and FBM—are applied to short‑video visual effects, showcases shader implementations and image examples, and compares GPU random‑number generators to help developers choose the right balance of performance and visual quality.

GPURandom Number GenerationShader

0 likes · 17 min read

How Noise Powers Real‑Time Short‑Video Effects: Algorithms, Samples & GPU RNG Comparison

Xianyu Technology

Oct 21, 2021 · Mobile Development

Flutter iOS GPU Background Crash Analysis and Solution

The article analyzes why Flutter crashes on iOS when accessing the GPU in the background, explains the official SyncSwitch fix for ImageDecoder, and details Xianyu’s additional patches for MultipleFrameCodec, EncodeImage, and Rasterizer::DrawToSurface that together, via PR #28383, fully resolve the GPU‑background crash.

CrashFlutterGPU

0 likes · 11 min read

Flutter iOS GPU Background Crash Analysis and Solution

21CTO

Oct 2, 2021 · Artificial Intelligence

How PyTorch Lightning Can Make Your Deep Learning Pipeline 10× Faster

This article explains six practical techniques—parallel data loading, distributed multi‑GPU training, mixed precision, early stopping, sharded training, and inference optimizations—using PyTorch Lightning to dramatically accelerate deep‑learning pipelines, turning days‑long experiments into minute‑scale runs.

Deep LearningGPUPyTorch Lightning

0 likes · 7 min read

How PyTorch Lightning Can Make Your Deep Learning Pipeline 10× Faster

Architects' Tech Alliance

Sep 23, 2021 · Fundamentals

Understanding High‑Performance Computing (HPC): Principles, Architecture, and Applications

The article explains high‑performance computing (HPC) concepts, including serial and parallel processing, supercomputer performance measured in FLOPS, real‑world scientific applications such as drug discovery and weather forecasting, and the hardware architectures that enable these massive computational capabilities.

FLOPSGPUHPC

0 likes · 7 min read

Understanding High‑Performance Computing (HPC): Principles, Architecture, and Applications

Meituan Technology Team

Sep 9, 2021 · Artificial Intelligence

GPU Optimization Practices for CTR Models at Meituan

Meituan accelerates CTR model inference by fusing operators with TVM, optimizing CPU‑GPU data transfers, manually tuning high‑frequency subgraphs, and dynamically offloading workloads, achieving up to ten‑fold throughput gains on Tesla T4 GPUs while keeping latency stable and only modestly increasing beyond 128 QPS, though compilation remains slow and large‑model support needs improvement.

CTRDeep LearningGPU

0 likes · 16 min read

GPU Optimization Practices for CTR Models at Meituan

Architects' Tech Alliance

Aug 29, 2021 · Fundamentals

GPU Overview: History, Architecture, Processing Workflow, and Acceleration Technologies (CUDA & OpenCL)

This article provides a comprehensive overview of GPUs, covering their history, architecture, processing workflow, and acceleration technologies such as CUDA and OpenCL, while comparing GPU and CPU designs and offering resources for further study.

CUDAGPUOpenCL

0 likes · 14 min read

GPU Overview: History, Architecture, Processing Workflow, and Acceleration Technologies (CUDA & OpenCL)

Python Programming Learning Circle

Aug 27, 2021 · Artificial Intelligence

An Introduction to JAX: Features, Installation, and Comparison with TensorFlow and PyTorch

This article introduces Google’s JAX library, covering its origins, core features such as automatic differentiation, JIT compilation, parallel and vectorized mapping, installation steps, code examples, and a comparative overview with TensorFlow and PyTorch for deep‑learning practitioners.

Deep LearningGPUJAX

0 likes · 11 min read

An Introduction to JAX: Features, Installation, and Comparison with TensorFlow and PyTorch

Liangxu Linux

Aug 17, 2021 · Cloud Native

How to Enable GPU Acceleration in Docker on Linux

This guide walks you through installing NVIDIA drivers, CUDA, and nvidia-docker2 on a Linux host, configuring Docker to access the GPU, and verifying the setup with commands and sample TensorFlow/PyTorch code, enabling deep‑learning workloads inside containers.

CUDADeep LearningDocker

0 likes · 7 min read

How to Enable GPU Acceleration in Docker on Linux

DataFunSummit

Aug 16, 2021 · Artificial Intelligence

Scaling Deep Learning Models: From Depth to Width and Parallelism Strategies

The article reviews how deep learning models have grown deeper and wider, discusses the memory and bandwidth limits of single GPUs, and explains pipeline and sharding techniques—including GPU clusters and TPU pods—to efficiently train large‑scale models in industrial settings.

GPUMixture of ExpertsModel Parallelism

0 likes · 6 min read

Scaling Deep Learning Models: From Depth to Width and Parallelism Strategies

Architects' Tech Alliance

Jul 16, 2021 · Artificial Intelligence

AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning

The article explains how artificial intelligence relies on algorithms, compute and data, compares engineering and simulation methods, and details the roles, architectures, performance and energy characteristics of GPUs, FPGAs, and ASICs as the primary hardware accelerators for modern deep‑learning applications.

ASICChip DesignDeep Learning

0 likes · 14 min read

AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning

Python Programming Learning Circle

Jul 15, 2021 · Artificial Intelligence

Accelerating NumPy with CuPy: GPU Speedup Demonstrations

This article explains how the CuPy library leverages NVIDIA CUDA GPUs to replace NumPy for array operations, provides installation steps, and presents benchmark code showing up to 700× speed improvements for large‑scale matrix computations compared to CPU‑based NumPy.

CuPyGPUNumPy

0 likes · 6 min read

Accelerating NumPy with CuPy: GPU Speedup Demonstrations

Architects' Tech Alliance

Jun 29, 2021 · Artificial Intelligence

Evolution and Future Trends of Automotive Chips for Autonomous Driving

The article reviews the historical shift from CPU‑based ECUs to GPU‑centric and ASIC‑centric automotive processors, analyzes current GPU dominance, examines key industry players, and discusses why ASICs are expected to become the primary solution for future autonomous‑driving chips.

AI hardwareASICGPU

0 likes · 16 min read

Evolution and Future Trends of Automotive Chips for Autonomous Driving

Tencent Advertising Technology

May 18, 2021 · Cloud Computing

TI-ONE Platform FAQ: Managing Notebook Instances, Switching Resources, and Resolving Common Issues

This FAQ summarizes practical guidance for participants of the 2021 Tencent Advertising Algorithm Competition on using the TI-ONE cloud notebook platform, covering instance initialization, resource switching, data handling, common errors, and Git integration.

AI competitionData ManagementFAQ

0 likes · 5 min read

TI-ONE Platform FAQ: Managing Notebook Instances, Switching Resources, and Resolving Common Issues

Python Programming Learning Circle

Apr 28, 2021 · Fundamentals

Getting Started with Numba: Python JIT Compilation and GPU Acceleration

This article introduces Numba, a Python just‑in‑time compiler, explains why it’s advantageous over alternatives, and provides detailed guidance on using its decorators such as @jit, @njit, @vectorize, and @cuda.jit, including code examples for CPU and GPU acceleration.

CUDAGPUJIT

0 likes · 12 min read

Getting Started with Numba: Python JIT Compilation and GPU Acceleration

Architects' Tech Alliance

Apr 26, 2021 · Artificial Intelligence

GPU Market Overview and Industry Applications

The article provides a comprehensive overview of GPU technology, its architecture, rapid market growth, segmentation by type, device and industry, cloud deployment trends, competitive landscape, and diverse applications ranging from high‑performance computing and AI to automotive, AR/VR, and IoT.

GPUHigh‑Performance ComputingMarket analysis

0 likes · 9 min read

GPU Market Overview and Industry Applications

JD Cloud Developers

Apr 26, 2021 · Artificial Intelligence

Top Tech Highlights: Open‑Source Mars Drone, AI‑Powered GPUs, Cloud Growth & More

This week’s developer newsletter spotlights NASA’s open‑source‑based Ingenuity helicopter soaring on Mars, JD’s ESG report and green cloud initiatives, NVIDIA’s record‑breaking AI inference GPUs, rapid growth of China’s public‑cloud market, Tsinghua’s new chip academy, Hugging Face’s Accelerate library for multi‑GPU training, plus cutting‑edge research on GAN IP protection and hierarchical task learning presented at CVPR and ICLR.

AIGPUopen source

0 likes · 5 min read

Top Tech Highlights: Open‑Source Mars Drone, AI‑Powered GPUs, Cloud Growth & More

Kuaishou Tech

Mar 25, 2021 · Fundamentals

GPU Text Rendering Techniques: Image Text, Triangulated Text, Vector Text, and Signed Distance Fields

This article explains how short‑video apps render text on the GPU by converting characters into GPU‑friendly primitives, covering image‑based text, triangulated glyphs, vector rendering extensions, and signed distance field methods, along with their advantages, drawbacks, and practical libraries.

GPUGraphicssigned distance field

0 likes · 11 min read

GPU Text Rendering Techniques: Image Text, Triangulated Text, Vector Text, and Signed Distance Fields

Architects' Tech Alliance

Mar 25, 2021 · Fundamentals

The Evolution of Compute Power: From CPUs and GPUs to DPUs and Future Data‑Center Architectures

This article examines how computing power has become a key production factor, detailing the shift from traditional CPUs and GPUs to specialized processors like DPUs, and explores emerging paradigms such as in‑memory, near‑memory, and edge computing that reshape data‑center architectures.

ASICCPUCompute

0 likes · 17 min read

The Evolution of Compute Power: From CPUs and GPUs to DPUs and Future Data‑Center Architectures

Architects' Tech Alliance

Mar 15, 2021 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

This article provides a comprehensive overview of NVIDIA's GPU architecture evolution—covering Fermi, Kepler, Maxwell, Pascal, Volta, Turing, and Ampere—detailing compute capabilities, SM structures, specialized units such as Tensor Cores, and their impact on AI and high‑performance computing workloads.

AICUDAGPU

0 likes · 19 min read

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

Architects' Tech Alliance

Mar 13, 2021 · Artificial Intelligence

Industry Deep Report: GPU Research Framework

This report analyzes the evolution of processor chips, the rise of heterogeneous computing, and provides a comprehensive GPU investment logic framework, detailing GPU architecture, market competition, global industry landscape, and the challenges and prospects of domestic GPU development in China.

AIChipGPU

0 likes · 5 min read

Industry Deep Report: GPU Research Framework

MaGe Linux Operations

Mar 11, 2021 · Artificial Intelligence

What’s New in PyTorch 1.8? Key Features, APIs, and Performance Boosts

PyTorch 1.8, released by the PyTorch team, bundles over 3,000 commits since 1.7, introducing AMD ROCm support, enhanced Python function conversion, stable FFT and linear‑algebra APIs, complex‑tensor autograd, distributed‑training improvements, new mobile tutorials, performance tools, and several prototype features.

Deep LearningGPUMobile

0 likes · 6 min read

What’s New in PyTorch 1.8? Key Features, APIs, and Performance Boosts

Architects' Tech Alliance

Mar 7, 2021 · Fundamentals

Understanding the Linux Graphics Stack from a GPU Perspective

This article explains the role of GPUs in computing, traces the evolution of graphics standards and GPU architectures, and details the development of the Linux graphics stack from early X11 to modern Wayland, providing a comprehensive overview for developers and hardware enthusiasts.

GPUGraphics StackOpen standards

0 likes · 3 min read

Understanding the Linux Graphics Stack from a GPU Perspective

ITPUB

Mar 7, 2021 · Blockchain

Can You Mine Ethereum on an Apple M1 Mac? A Hands‑On Test and Results

This article documents a developer’s attempt to run Ethereum mining software on an M1‑based MacBook Air, detailing the required patches, compilation steps, observed hash rates, daily earnings, and how the performance compares with traditional GPU miners.

BlockchainEthereumGPU

0 likes · 9 min read

Can You Mine Ethereum on an Apple M1 Mac? A Hands‑On Test and Results

360 Tech Engineering

Mar 1, 2021 · Artificial Intelligence

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

This article details the engineering challenges of serving a large BERT model in real‑time for 360 Search and describes a series of optimizations—including TensorRT‑based kernel fusion, model quantization, knowledge distillation, multi‑stream execution, caching, and dynamic sequence handling—that together achieve low latency, high throughput, and stable deployment on GPU clusters.

BERTGPUModel Optimization

0 likes · 10 min read

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

Python Programming Learning Circle

Feb 25, 2021 · Big Data

Parallel Computing and Python Multiprocessing: Concepts, Models, and Practical Examples

This article explains the fundamentals of parallel computing in the big‑data era, compares parallelism and concurrency, outlines GPU and distributed‑computing solutions, and provides a detailed guide to Python’s multiprocessing module with code examples, performance tests, and practical tips.

Big DataGPUPython

0 likes · 18 min read

Parallel Computing and Python Multiprocessing: Concepts, Models, and Practical Examples

Open Source Linux

Feb 8, 2021 · Operations

How to Set Up Docker with NVIDIA GPU for Deep Learning on Linux

This guide walks you through installing NVIDIA drivers, CUDA, and nvidia-docker2 on a Linux host and configuring Docker containers to access the GPU, including verification steps and sample TensorFlow and PyTorch commands.

CUDADeep LearningDocker

0 likes · 8 min read

How to Set Up Docker with NVIDIA GPU for Deep Learning on Linux

Architects' Tech Alliance

Jan 9, 2021 · Artificial Intelligence

Heterogeneous Computing: Why, Standards, and Performance Comparison of CPU, GPU, FPGA, and ASIC

The article examines the rapid growth of data‑center workloads, explains why heterogeneous accelerators such as CPUs, GPUs, FPGAs and ASICs are needed, outlines evaluation standards, compares their compute performance and power efficiency, and discusses practical deployment cases and future trends.

ASICCPUFPGA

0 likes · 22 min read

Heterogeneous Computing: Why, Standards, and Performance Comparison of CPU, GPU, FPGA, and ASIC

Architects' Tech Alliance

Jan 5, 2021 · Operations

Understanding Data Centers: Architecture, Technologies, and Operational Considerations

This article explains what data centers are, outlines their core components—compute, storage, and networking—covers architectural decisions, industry standards, and emerging technologies such as edge computing, micro‑data centers, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration, highlighting their impact on modern enterprise operations.

Edge ComputingGPUHCI

0 likes · 11 min read

Understanding Data Centers: Architecture, Technologies, and Operational Considerations

Architects' Tech Alliance

Dec 30, 2020 · Artificial Intelligence

Understanding GPUs, AI Accelerators, and Market Trends

The article explains GPU evolution, its integration with CPUs, interconnect technologies like PCIe and NVLink, market shares of NVIDIA, AMD and Intel, AI accelerator types (GPU, FPGA, ASIC), and the roles of training and inference in cloud AI, while also promoting a paid 182‑page PPT resource.

AI acceleratorGPUHPC

0 likes · 7 min read

Understanding GPUs, AI Accelerators, and Market Trends

21CTO

Dec 22, 2020 · Artificial Intelligence

Explore tinygrad: A Minimalist Deep Learning Framework Under 1000 Lines

tinygrad, an open‑source autograd tensor library by George Hotz, offers a compact PyTorch‑like experience in fewer than 1000 lines, with easy installation, GPU support via PyOpenCL, full EfficientNet inference, and extensible optimizers for rapid neural‑network prototyping.

AIAutogradDeep Learning

0 likes · 6 min read

Explore tinygrad: A Minimalist Deep Learning Framework Under 1000 Lines

Programmer DD

Dec 17, 2020 · Artificial Intelligence

Can Huang’s Law Double AI Performance Every Two Years? NVIDIA GTC 2020 Insights

At NVIDIA’s GTC China 2020, chief scientist Bill Dally highlighted the “Huang’s Law” predicting GPU-driven AI performance to double biennially, introduced projects like MAGNet, optical interconnects, and the Legate programming model, and discussed the broader implications for AI ecosystem development and industry adoption.

AI PerformanceGPUHuang's Law

0 likes · 8 min read

Can Huang’s Law Double AI Performance Every Two Years? NVIDIA GTC 2020 Insights

Architects' Tech Alliance

Dec 16, 2020 · Artificial Intelligence

AI Chip Landscape: Architecture, Trends, and Market Players

This article provides a comprehensive overview of the AI chip ecosystem, covering the evolution of GPU, FPGA, ASIC and neuromorphic chips, their performance trade‑offs, key industry players, and the rapid growth of China’s domestic chip manufacturers in the context of deep‑learning demands.

AI chipsASICFPGA

0 likes · 11 min read

AI Chip Landscape: Architecture, Trends, and Market Players

DataFunSummit

Dec 14, 2020 · Artificial Intelligence

LightSeq: High‑Performance Open‑Source Inference Engine for Transformers, GPT and Other NLP Models

This article introduces LightSeq, an open‑source, GPU‑accelerated inference engine that dramatically speeds up Transformer‑based models such as BERT and GPT by up to 14× over TensorFlow, supports multiple decoding strategies, integrates seamlessly with major deep‑learning frameworks, and provides detailed performance benchmarks and technical optimizations.

Deep LearningGPUInference

0 likes · 15 min read

LightSeq: High‑Performance Open‑Source Inference Engine for Transformers, GPT and Other NLP Models

Architects' Tech Alliance

Dec 6, 2020 · Operations

Understanding Data Centers: Architecture, Reliability, and Emerging Technologies

This article explains what a data center is, its core components of compute, storage, and networking, the operational and architectural considerations for reliability and security, and reviews industry standards and emerging technologies such as edge computing, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration.

Edge ComputingGPUInfrastructure

0 likes · 12 min read

Understanding Data Centers: Architecture, Reliability, and Emerging Technologies

Programmer DD

Dec 6, 2020 · Cloud Native

Enable GPU Support in Kubernetes with Containerd and NVIDIA Runtime

This guide walks through installing NVIDIA drivers, CUDA toolkit, nvidia-container-runtime, configuring Containerd, deploying the NVIDIA device plugin, and testing GPU access inside Kubernetes pods, providing a complete solution for GPU workloads on containerd‑based clusters.

CUDADevice PluginsGPU

0 likes · 11 min read

Enable GPU Support in Kubernetes with Containerd and NVIDIA Runtime

Sohu Tech Products

Nov 18, 2020 · Game Development

Best Practices for Metal on Apple Silicon: Architecture Migration, GPU Changes, and Optimization Techniques

This article explains how Apple Silicon affects Metal applications, outlines migration steps from Intel to Apple Silicon, describes new GPU architectures and API features, and provides practical best‑practice guidelines to achieve optimal performance and correctness on the new platform.

Apple SiliconGPUMetal

0 likes · 11 min read

Best Practices for Metal on Apple Silicon: Architecture Migration, GPU Changes, and Optimization Techniques

Efficient Ops

Sep 3, 2020 · Operations

What Recent Cloud and Data Center Incidents Reveal About Industry Risks?

A roundup of recent tech news covering a Cisco sabotage case, a London data‑center fire, Linux's 29th anniversary, Gartner's China ICT trends, major cloud investments, Windows 95 milestones, Didi's GPU server launch, Hainan's DNS project, Dell’Oro's market report, executive share reductions, and an upcoming global operations conference.

Data centerGPUOperations

0 likes · 10 min read

What Recent Cloud and Data Center Incidents Reveal About Industry Risks?

Tencent Tech

Aug 26, 2020 · Artificial Intelligence

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.

GPUImageNetTencent Cloud

0 likes · 9 min read

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Didi Tech

Aug 19, 2020 · Cloud Computing

Optimizing GPU Virtual Machine Instance Creation Time in Public Cloud Environments

DiDi Cloud reduced GPU VM provisioning latency by over 90% through kernel‑level pre‑zeroing of idle pages, transparent huge pages, optimized VFIO DMA mapping, and boot‑sequence streamlining, turning GPU instance creation faster than CPU‑only VMs and meeting strict latency demands.

GPULinux kernelQEMU

0 likes · 10 min read

Optimizing GPU Virtual Machine Instance Creation Time in Public Cloud Environments

Tencent Cloud Developer

Jul 7, 2020 · Artificial Intelligence

Remote Development Guide on Tencent Cloud GPU Instances: Driver, CUDA, cuDNN Installation and PyCharm/Jupyter Integration

This guide walks researchers through selecting a Tencent Cloud GN7 GPU instance, installing NVIDIA drivers, CUDA 10.2, cuDNN, setting up PyTorch and Jupyter, and configuring remote development with PyCharm, enabling efficient, cost‑effective AI development on a Tesla T4 GPU server.

AICUDAGPU

0 likes · 12 min read

Remote Development Guide on Tencent Cloud GPU Instances: Driver, CUDA, cuDNN Installation and PyCharm/Jupyter Integration

Architects' Tech Alliance

Jul 3, 2020 · Industry Insights

Why AI Chips Are Powering the Next Tech Surge: Architectures, Trends, and Key Players

This article surveys the rapid rise of AI chips, explains why traditional CPUs fall short for deep learning, compares GPU, FPGA, and ASIC designs, outlines market dynamics in the US and China, and highlights emerging opportunities for specialized ASICs in mobile and edge applications.

AI chipsASICChina semiconductor

0 likes · 12 min read

Why AI Chips Are Powering the Next Tech Surge: Architectures, Trends, and Key Players

Alibaba Cloud Developer

Jun 18, 2020 · Artificial Intelligence

How to Build a GPU‑Accelerated Distributed ML Platform for VM Migration Prediction

This article explains how to design and implement a GPU‑accelerated, distributed machine‑learning system on Alibaba Cloud to predict virtual‑machine workload and hot‑migration downtime, covering architecture, components, message‑queue design, data handling, GPU acceleration, and model deployment.

CloudComputingDistributedMLGPU

0 likes · 13 min read

How to Build a GPU‑Accelerated Distributed ML Platform for VM Migration Prediction

Architects' Tech Alliance

Jun 8, 2020 · Fundamentals

Overview of ARM Architecture, Business Strategy, and Recent Product Developments

This article provides a comprehensive overview of ARM's history, core architecture, business strategies, recent CPU and GPU releases, and its positioning in AIoT, cloud computing, and emerging markets, highlighting both technical details and market-oriented initiatives.

AIoTARMBusiness strategy

0 likes · 5 min read

Overview of ARM Architecture, Business Strategy, and Recent Product Developments

Bitu Technology

Jun 5, 2020 · Cloud Native

Building Tubi Data Runtime on JupyterHub: Architecture, Authentication, Storage, GPU Support, and Autoscaling

This article details how Tubi built the Tubi Data Runtime platform on JupyterHub using Kubernetes, covering authentication with Okta SSO, custom Docker images, shared EFS storage, multi‑service support, GPU enablement, node affinity, cluster autoscaling, and monitoring with Prometheus.

AWSCloud NativeDocker

0 likes · 17 min read

Building Tubi Data Runtime on JupyterHub: Architecture, Authentication, Storage, GPU Support, and Autoscaling

TAL Education Technology

May 14, 2020 · Artificial Intelligence

An Introduction to GPU Computing and CUDA Architecture

This article provides a concise overview of GPU computing fundamentals, covering GPU hardware components, memory hierarchy, parallel execution models, and the CUDA programming framework, illustrating how CPUs and GPUs cooperate in heterogeneous computing environments.

CUDACUDA programmingGPU

0 likes · 16 min read

An Introduction to GPU Computing and CUDA Architecture

Architects' Tech Alliance

May 10, 2020 · Fundamentals

Server CPU, GPU, and Memory Basics

This article introduces the essential components of a server—CPU, GPU, and memory—explaining their roles, characteristics, and common configurations, especially for video processing and artificial intelligence workloads, while providing visual diagrams and further reading suggestions.

CPUGPUHardware

0 likes · 4 min read

Architects' Tech Alliance

May 5, 2020 · Fundamentals

Why Heterogeneous Computing Is the Future: CPUs, GPUs, FPGAs, and More Explained

The article provides a comprehensive overview of heterogeneous computing, detailing its definition, real‑world system examples, performance advantages, key programming frameworks such as OpenCL and CUDA, industry trends like SOC integration, and a comparative analysis of CPUs, GPUs, FPGAs and ASICs.

CPUCUDAFPGA

0 likes · 9 min read

Why Heterogeneous Computing Is the Future: CPUs, GPUs, FPGAs, and More Explained

Architects' Tech Alliance

Apr 18, 2020 · Artificial Intelligence

Choosing the Right Compute Core for Edge AI: CPU, GPU, FPGA, ASIC, VPU & TPU Compared

This article analyzes how system architects can select the optimal heterogeneous compute cores—CPU, GPU, FPGA, ASIC, VPU, or TPU—for edge AI deployments, weighing performance, size, weight, power, and cost to maximize inference efficiency and security.

AI edge computingASICCPU

0 likes · 7 min read

Choosing the Right Compute Core for Edge AI: CPU, GPU, FPGA, ASIC, VPU & TPU Compared

Architects' Tech Alliance

Mar 28, 2020 · Artificial Intelligence

Heterogeneous Computing: Overview of CPU, GPU, FPGA, ASIC, and NPU

This article explains heterogeneous computing and compares major processing units—CPU, GPU, FPGA, ASIC, and NPU—highlighting their architectures, strengths, and typical use cases, especially in deep‑learning and AI workloads.

ASICCPUDeep Learning

0 likes · 10 min read

Heterogeneous Computing: Overview of CPU, GPU, FPGA, ASIC, and NPU

Architects' Tech Alliance

Feb 6, 2020 · Fundamentals

How Computer Memory Evolved: From SDRAM to DDR4 and Modern GPU Memory

This article explains the historical shift from early north‑bridge memory buses to integrated CPU memory controllers, details the progression of SDRAM to DDR4—including voltage, prefetch and feature changes—covers future trends in capacity, voltage and frequency, and compares system memory bandwidth with GPU memory technologies such as GDDR5 and HBM.

DDRGPUHardware

0 likes · 11 min read

How Computer Memory Evolved: From SDRAM to DDR4 and Modern GPU Memory

Alibaba Cloud Native

Jan 13, 2020 · Cloud Native

How to Manage GPU Resources in Kubernetes: From Containers to Device Plugins

This article explains why managing GPUs with Kubernetes improves cost efficiency and deployment speed, details how to containerize GPU workloads, build appropriate images, configure NVIDIA drivers, and use Kubernetes Device Plugins and Extend Resources to schedule and monitor GPU resources, while also discussing current limitations and community solutions.

Device PluginGPUKubernetes

0 likes · 18 min read

How to Manage GPU Resources in Kubernetes: From Containers to Device Plugins

Architects' Tech Alliance

Dec 29, 2019 · Artificial Intelligence

Overview of AI Chip Development Paths: CPU, GPU, FPGA, ASIC, and Neuromorphic Chips

The article reviews the evolution of artificial‑intelligence hardware, comparing traditional CPUs with parallel GPUs, reconfigurable FPGAs, fully custom ASICs, and emerging neuromorphic chips, highlighting their architectures, performance trade‑offs, power consumption, and current industry adoption.

AI chipsASICCPU

0 likes · 12 min read

Overview of AI Chip Development Paths: CPU, GPU, FPGA, ASIC, and Neuromorphic Chips

Architects' Tech Alliance

Dec 28, 2019 · Artificial Intelligence

Understanding CPU vs GPU, GPU Parameters, and NVIDIA Architectures for AI and High‑Performance Computing

The article explains how CPUs and GPUs differ in architecture and workload handling, details key GPU specifications such as CUDA cores, memory bandwidth and floating‑point precision, reviews NVIDIA's product families and architectural evolution, and highlights the role of GPUs in deep learning training and inference while also mentioning a related technical ebook promotion.

AICPUCUDA

0 likes · 13 min read

Understanding CPU vs GPU, GPU Parameters, and NVIDIA Architectures for AI and High‑Performance Computing

Architects' Tech Alliance

Dec 27, 2019 · Fundamentals

Survey of GPU-Accelerated HPC Applications Across Scientific Domains

This article surveys the rapid growth of GPU-accelerated high‑performance computing (HPC) applications driven by NVIDIA's ecosystem, detailing the most common scientific fields, the proportion of GPU‑supported tools, and the emerging role of AI as a primary growth engine.

AICUDAGPU

0 likes · 8 min read

Survey of GPU-Accelerated HPC Applications Across Scientific Domains

Architects' Tech Alliance

Dec 21, 2019 · Fundamentals

GPU Overview, Usage Methods, and Virtualization Technologies

This article explains the definition and history of GPUs, why dedicated graphics processors are needed, how they are accessed through graphics libraries and vendor APIs such as OpenGL, DirectX, CUDA and OpenCL, and describes various GPU virtualization techniques including virtual graphics cards, passthrough, and vCUDA with their client‑server‑manager architecture.

CUDAComputeGPU

0 likes · 20 min read

GPU Overview, Usage Methods, and Virtualization Technologies

Architects' Tech Alliance

Dec 16, 2019 · Fundamentals

CPU vs GPU: Architectural Differences and Their Roles in Computing and AI

The article explains the structural differences between CPUs and GPUs, their respective design goals, and why GPUs excel at parallel image and AI workloads, while also noting a New Year promotional bundle of technical e‑books priced at 168 yuan.

CPUGPUartificial intelligence

0 likes · 8 min read

CPU vs GPU: Architectural Differences and Their Roles in Computing and AI

360 Quality & Efficiency

Dec 6, 2019 · Artificial Intelligence

Accelerating OpenCV Image Matching with GPU (CUDA) in Python

This article demonstrates how compiling OpenCV 3.2 with CUDA 8.0 enables GPU‑accelerated template matching in Python, reducing average processing time from 0.299 seconds on CPU to 0.181 seconds on GPU—a 39.4% performance gain for automated testing image‑recognition APIs.

CUDAGPUOpenCV

0 likes · 3 min read

Accelerating OpenCV Image Matching with GPU (CUDA) in Python

360 Quality & Efficiency

Dec 6, 2019 · Artificial Intelligence

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

AIDockerGPU

0 likes · 9 min read

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

360 Quality & Efficiency

Dec 6, 2019 · Artificial Intelligence

Technical Research on Anime4K: Real‑Time Super‑Resolution Algorithm for Anime Images

Anime4K is a GPU‑accelerated super‑resolution algorithm tailored for animated images that achieves single‑digit millisecond latency, and the article explains its underlying residual‑based principle, practical results, and performance limitations on both GPU and CPU platforms.

Anime4KGPUImage Processing

0 likes · 3 min read

Technical Research on Anime4K: Real‑Time Super‑Resolution Algorithm for Anime Images

360 Quality & Efficiency

Dec 6, 2019 · Artificial Intelligence

Technical Research on Anime4K: Real-Time Super-Resolution Algorithm for Animated Images

Anime4K is a GPU-accelerated super‑resolution algorithm designed for animated images, achieving sub‑10 ms latency by enhancing low‑resolution frames with edge‑sharpened residuals, and the article details its principles, visual results, and real‑time performance limitations on typical CPUs.

Anime4KGPUImage Processing

0 likes · 2 min read

Technical Research on Anime4K: Real-Time Super-Resolution Algorithm for Animated Images

Architects' Tech Alliance

Nov 30, 2019 · Fundamentals

ARM, Intel, and AMD Reveal 2020 CPU and GPU Roadmaps for Mobile and Desktop Devices

This article summarizes the latest 2020 processor announcements from ARM, Intel, and AMD, highlighting ARM's Cortex‑A77 and Mali‑G77 designs, Intel's 10 nm Ice Lake and Athena initiatives for thin laptops, and AMD's Ryzen 3000 7 nm lineup, while comparing their differing strategies for the future of computing.

AMDARMCPU

0 likes · 9 min read

ARM, Intel, and AMD Reveal 2020 CPU and GPU Roadmaps for Mobile and Desktop Devices

Architects' Tech Alliance

Oct 19, 2019 · Artificial Intelligence

Overview of AI Chip Development Paths: CPU, GPU, FPGA, ASIC, and Neuromorphic Designs

The article surveys the evolution of artificial‑intelligence chips, comparing traditional CPU architectures with parallel accelerators such as GPUs and FPGAs, fully custom ASICs, and emerging neuromorphic chips, highlighting their structures, performance trade‑offs, and application scenarios.

AI chipsASICCPU

0 likes · 11 min read

Overview of AI Chip Development Paths: CPU, GPU, FPGA, ASIC, and Neuromorphic Designs

Snowball Engineer Team

Oct 17, 2019 · Artificial Intelligence

GPU-Accelerated Model Training Optimizations for Snowball Feed Recommendation System

This article describes the challenges of large‑scale model training for Snowball’s feed recommendation, and details a series of engineering optimizations—including GPU acceleration, multi‑threaded data preparation, TFRecord conversion, compression, and batch‑map reordering—that increased training throughput from 6 k to over 20 k samples per second while reducing CPU and I/O bottlenecks.

GPUModel TrainingTFRecord

0 likes · 15 min read

GPU-Accelerated Model Training Optimizations for Snowball Feed Recommendation System

Architects' Tech Alliance

Oct 14, 2019 · Industry Insights

From ECU CPUs to ASICs: The Evolution of Automotive Chips for Autonomous Driving

This article traces the development of automotive electronic control units from early CPU‑centric ECUs to centralized domain controllers, examines the rise of GPU‑based AI accelerators for assisted driving, and explains why ASICs are expected to dominate future autonomous‑driving chips, while profiling key industry players and their strategies.

AI AcceleratorsASICFPGA

0 likes · 21 min read

From ECU CPUs to ASICs: The Evolution of Automotive Chips for Autonomous Driving

Architects' Tech Alliance

Oct 12, 2019 · Fundamentals

Understanding GPUs: History, Architecture, and Acceleration Technologies (CUDA & OpenCL)

This article explains the history, architecture, and operation of GPUs, and introduces major acceleration frameworks such as CUDA and OpenCL, highlighting their roles in parallel computing and modern graphics processing for scientific and AI workloads.

CUDAGPUGraphics Processing Unit

0 likes · 13 min read

Understanding GPUs: History, Architecture, and Acceleration Technologies (CUDA & OpenCL)

360 Zhihui Cloud Developer

Sep 26, 2019 · Artificial Intelligence

Turn AI Into a Gallery‑Ready Painter with Docker and Style Transfer

Learn how to turn an AI robot into a gallery‑ready painter by setting up Docker and nvidia‑docker, installing GPU drivers, and running a neural‑style transfer pipeline that blends photographs with Monet’s brushstrokes, producing high‑quality artwork in minutes.

AI artDeep LearningDocker

0 likes · 8 min read

Turn AI Into a Gallery‑Ready Painter with Docker and Style Transfer

Architects' Tech Alliance

Sep 20, 2019 · Industry Insights

Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing

The article explains how heterogeneous parallel computing—distributing tasks across CPUs, GPUs, FPGAs and other accelerators—has become essential after Moore’s law plateau, detailing its principles, hardware and software perspectives, classification of architectures, processing stages, user‑guided versus compiler‑guided methods, and its relevance to AI, cloud and industry workloads.

CPUFPGAGPU

0 likes · 15 min read

Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing

Tencent Cloud Developer

Sep 20, 2019 · Artificial Intelligence

Architecture of Tencent Cloud AI Platform (YunZhiTianshu) and AI Practices on Kubernetes

The article details Tencent Cloud’s YunZhiTianshu AI platform architecture—spanning Docker/Kubernetes infrastructure, storage, six micro‑service layers and API/message gateways—while explaining core module designs, unified algorithm packaging, device and data abstraction, and practical Kubernetes deployment techniques for GPU‑accelerated AI workloads, monitoring, scaling, and security.

AI PlatformGPUKubernetes

0 likes · 15 min read

Architecture of Tencent Cloud AI Platform (YunZhiTianshu) and AI Practices on Kubernetes

Architects' Tech Alliance

Sep 6, 2019 · Fundamentals

Understanding the Differences Between CPU and GPU Architectures

CPU and GPU serve distinct roles in computing: the CPU, as a versatile general‑purpose processor, handles complex logic and varied data types, while the GPU, built with many simple cores and long pipelines, excels at parallel processing of uniform, large‑scale data such as graphics and AI workloads.

AICPUGPU

0 likes · 10 min read

Understanding the Differences Between CPU and GPU Architectures

Architects' Tech Alliance

Sep 5, 2019 · Fundamentals

GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)

This article explains the history and origin of GPUs, compares CPU and GPU architectures, describes the GPU processing pipeline, and introduces acceleration technologies such as CUDA and OpenCL, highlighting their programming models, supported languages, and key performance metrics.

CUDAGPUGraphics Processing

0 likes · 14 min read

GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)

Architects' Tech Alliance

Sep 2, 2019 · Databases

The Relationship Between Databases and Emerging Hardware Technologies

This article examines how recent hardware advances such as multi‑core processors, large memory, SSDs, NVM, GPUs and FPGAs have reshaped database system design, outlines the stages from pure academic research to productization, and surveys current database products and research directions leveraging these new devices.

FPGAGPUNVM

0 likes · 11 min read

The Relationship Between Databases and Emerging Hardware Technologies

Alibaba Cloud Developer

Jul 17, 2019 · Artificial Intelligence

How Alibaba Halved BERT Latency for Real‑Time Search

This article details Alibaba's technical challenges with BERT's high resource consumption in online search, analyzes memory and compute bottlenecks using TensorFlow profiling, and presents both TensorFlow‑based tweaks and a custom CUDA implementation that together double throughput and cut latency by about 50%.

AlibabaBERTGPU

0 likes · 9 min read

How Alibaba Halved BERT Latency for Real‑Time Search

MaGe Linux Operations

May 15, 2019 · Cloud Computing

Unlock Free GPU Power: Master Google Colab for Python & Data Science

This guide walks you through getting started with Google Colab, covering setup, basic notebook usage, useful configurations, Google Drive mounting, and how the platform supports machine‑learning teaching and GPU acceleration, all without any local installation.

GPUGoogle ColabGoogle Drive

0 likes · 8 min read

Unlock Free GPU Power: Master Google Colab for Python & Data Science

360 Tech Engineering

May 10, 2019 · Artificial Intelligence

Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow

This article explains how MXNet supports data‑parallel training on single‑machine multi‑GPU and multi‑machine multi‑GPU setups, describes KVStore modes, outlines the worker‑server‑scheduler architecture, and shows how to launch large‑scale distributed training using Kubeflow and the mxnet‑operator.

Data ParallelDeep LearningDistributed Training

0 likes · 11 min read

Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow

360 Zhihui Cloud Developer

May 9, 2019 · Artificial Intelligence

Master Distributed MXNet Training with Kubeflow: A Step‑by‑Step Guide

Learn how to perform single‑machine multi‑GPU and multi‑node multi‑GPU training with MXNet, understand KVStore modes, configure workers, servers, and schedulers, and deploy large‑scale distributed training on Kubernetes using Kubeflow, including operator installation, task creation, and performance considerations.

Distributed TrainingGPUKubeflow

0 likes · 11 min read

Master Distributed MXNet Training with Kubeflow: A Step‑by‑Step Guide

Architects' Tech Alliance

Apr 27, 2019 · Fundamentals

Why GPUs Outperform CPUs: Core Parameters and Architecture Explained

This article explains the fundamental differences between CPUs and GPUs, outlines key GPU specifications such as CUDA cores, memory capacity, bandwidth, and floating‑point precision, and reviews NVIDIA's major GPU series and architectural evolution for high‑performance and AI workloads.

CPUDeep LearningGPU

0 likes · 11 min read

Why GPUs Outperform CPUs: Core Parameters and Architecture Explained

Architects' Tech Alliance

Apr 21, 2019 · Fundamentals

Differences Between CPU and GPU Architectures and the Relationship Between OpenCL and CUDA

This article explains the fundamental architectural differences between CPUs and GPUs, their design goals and performance characteristics, and compares OpenCL and CUDA, highlighting OpenCL’s cross‑platform flexibility versus CUDA’s NVIDIA‑specific optimization, while illustrating how each fits various parallel computing tasks.

CPUCUDAGPU

0 likes · 7 min read

Differences Between CPU and GPU Architectures and the Relationship Between OpenCL and CUDA

Architects' Tech Alliance

Apr 18, 2019 · Fundamentals

What Powers Modern Graphics? A Deep Dive into GPU History and Architecture

This article traces the evolution of GPUs from early graphics chips to modern parallel processors, explains their internal pipeline, compares CPU and GPU architectures, and introduces key acceleration frameworks like CUDA and OpenCL for general‑purpose computing.

CUDAGPUGPU architecture

0 likes · 13 min read

What Powers Modern Graphics? A Deep Dive into GPU History and Architecture