GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)
This article explains the history and origin of GPUs, compares CPU and GPU architectures, describes the GPU processing pipeline, and introduces acceleration technologies such as CUDA and OpenCL, highlighting their programming models, supported languages, and key performance metrics.
1. Origin of the GPU
GPU stands for Graphics Processing Unit and is widely used in embedded systems, mobile phones, PCs, workstations, and gaming solutions. Modern GPUs are highly parallel, giving them an advantage over general‑purpose CPUs for large‑scale data‑parallel algorithms.
In August 1985 ATI was founded and later released the first graphics chip and card using ASIC technology. By 1992 ATI’s Mach32 integrated graphics acceleration, but the term “GPU” was not used until AMD acquired ATI and NVIDIA coined the term in 1999 with the GeForce 256.
2. Working Principle
2.1 GPU Workflow Overview
The GPU graphics pipeline typically includes:
Vertex processing – reading vertex data and building the 3D skeleton, performed by a Vertex Shader.
Rasterization – converting vector geometry into pixel fragments.
Texture mapping – applying images to polygon surfaces via the Texture Mapping Unit (TMU).
Pixel processing – computing final pixel attributes using a Pixel Shader and outputting through the ROP.
Before GPUs, CPUs handled all computation with a serial, CISC‑based architecture, which is inefficient for multimedia workloads that require high compute density, massive concurrency, and frequent memory access.
GPUs consist of thousands of small, efficient cores designed for parallel tasks, while CPUs contain fewer, more complex cores optimized for control flow and caching.
3. GPU Acceleration Technologies
3.1 CUDA
In 2006 NVIDIA introduced CUDA (Compute Unified Device Architecture), a general‑purpose parallel computing model that lets developers write C‑based code for GPUs. CUDA provides an ISA, a parallel execution engine, and libraries such as CUFFT and CUBLAS for FFT and BLAS operations.
The runtime environment offers APIs for data types, memory management, device access, and kernel launch. Code runs as host code on the CPU and device code on the GPU.
CUDA supports C, C++, Fortran, and can be accessed through other frameworks like OpenCL, DirectCompute, OpenGL Compute Shaders, and C++ AMP. Third‑party bindings exist for Python, Java, Ruby, Haskell, MATLAB, etc.
3.2 OpenCL
OpenCL (Open Computing Language) is an open, cross‑vendor standard for heterogeneous computing on CPUs, GPUs, DSPs, and FPGAs. Unlike CUDA, which runs only on NVIDIA hardware, OpenCL targets any parallel processor.
An OpenCL program consists of kernel code that runs on the device and a host API that controls platform resources. It supports both task‑parallel and data‑parallel models, expanding GPU use beyond graphics to general computation.
OpenCL is maintained by the Khronos Group and provides a unified programming language, API, libraries, and runtime for heterogeneous systems.
3.3 Key GPU Metrics
CUDA cores – determine parallel processing capability; more cores generally mean higher performance for AI/ML workloads.
Memory capacity – size of VRAM for storing input and output data.
Memory bandwidth – width and frequency of the memory interface, affecting data transfer rates.
Specialized units – Tensor Cores, RT Cores, etc., for specific workloads like deep learning or ray tracing.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.