Fundamentals 13 min read

Understanding GPUs: History, Architecture, and Acceleration Technologies (CUDA & OpenCL)

This article explains the history, architecture, and operation of GPUs, and introduces major acceleration frameworks such as CUDA and OpenCL, highlighting their roles in parallel computing and modern graphics processing for scientific and AI workloads.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Understanding GPUs: History, Architecture, and Acceleration Technologies (CUDA & OpenCL)

1. Origin of GPU

In August 1985 ATI was founded; later that year it used ASIC technology to develop the first graphics chip and card. In April 1992 ATI released the Mach32 graphics card with acceleration. Although initially called VPU, after AMD acquired ATI the term GPU was adopted.

In 1999 NVIDIA introduced the GeForce 256 and coined the term GPU, reducing CPU dependence and handling tasks such as 3D rendering. Key technologies include hardware T&L, texture mapping, compression, bump mapping, and a 256‑bit rendering engine; hardware T&L became a hallmark of GPUs.

2. Working Principle

2.1 GPU Pipeline Overview

The GPU graphics pipeline typically performs:

Vertex processing: reads vertex data, determines shape and position, builds the skeleton of 3D objects, often implemented by a hardware Vertex Shader.

Rasterization: converts geometric primitives into pixel fragments.

Texture mapping: applies images to polygon surfaces using a Texture Mapping Unit (TMU).

Pixel processing: computes final pixel attributes via a Pixel Shader and outputs through the raster operation processor (ROP) to the frame buffer.

Before GPUs, CPUs performed most computations, but their serial architecture is inefficient for media‑intensive, highly parallel workloads. CPUs have limited registers and cache, whereas GPUs consist of thousands of simpler cores optimized for massive parallelism.

3. GPU Acceleration Technologies

3.1 CUDA

In 2006 NVIDIA released CUDA (Compute Unified Device Architecture), a general‑purpose parallel computing model that lets developers write C‑based programs for GPUs. CUDA includes an ISA, a parallel execution engine, and libraries such as CUFFT and CUBLAS for FFT and BLAS operations.

CUDA programs consist of host code running on the CPU and device code running on the GPU. The runtime provides APIs for memory management, device access, and kernel launch. Drivers expose a hardware abstraction layer that could evolve into a vendor‑neutral GPU interface.

3.2 OpenCL

OpenCL (Open Computing Language) is an open, cross‑vendor framework for heterogeneous computing on CPUs, GPUs, DSPs, FPGAs, etc. Unlike CUDA, which is limited to NVIDIA hardware, OpenCL aims to be portable across many devices.

OpenCL programs also have a kernel part that runs on the device and a host API that controls execution. The standard is maintained by the Khronos Group and provides both task‑parallel and data‑parallel programming models.

The article concludes with references to further reading on GPU architecture and related server technologies.

CUDAParallel ComputingGPUComputer ArchitectureOpenCLGraphics Processing Unit
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.