From 2D Cards to AI Powerhouses: The Evolution of GPUs
This article traces the GPU's journey from early 2D graphics cards to modern GPGPUs powering AI and HPC, explains core hardware components, compares GPU and CPU architectures, and details the 3D rendering pipeline that underlies graphics and parallel computation.
GPU Development History
1981: earliest graphics cards – IBM PC MDA and CGA 2D acceleration cards.
1988: first‑generation VGA Card – display‑only, graphics still processed by the CPU.
1991: second‑generation graphics card – dedicated chip for Windows GUI, offloading graphics from the CPU.
1994: third‑generation video card – added video codec acceleration.
1996: fourth‑generation 3D accelerator – NVIDIA TNT and TNT2 series.
1999: fifth‑generation GPU – NVIDIA GeForce 256 introduced a full 3D engine (transform, lighting, setup, rendering).
Present: sixth‑generation GPGPU – used widely in AI, HPC and beyond.
Note that a modern graphics card consists of a GPU, video memory, PCIe bus, PCB, RAM DAC, BIOS, various connectors and a cooling system.
Basic GPU Classification
NVIDIA introduced the GPU concept with the GeForce 256, moving PCs from integrated graphics to dedicated acceleration.
NVIDIA dominates the discrete GPU market with GeForce, GTX and RTX series.
Fundamental Differences Between GPU and CPU
Design Purpose (General vs Specialized Computing)
CPU : Designed for general‑purpose computing, low latency, complex control unit, few cores, high clock speed, strong ALU, large multi‑level caches (L1‑L3), features such as branch prediction and out‑of‑order execution.
GPU : Designed for specialized parallel computing, many cores, high throughput, many ALUs, simple control logic, small cache aimed at serving threads.
In analogy, the CPU is a versatile leader, while the GPU is a worker with massive compute power under the CPU’s scheduling.
GPU excels in workloads with massive parallelism and frequent memory access, delivering 10‑100× speedups over CPUs for graphics rendering, deep learning and other parallel tasks.
Computing Model (Serial vs Parallel)
GPU follows a parallel programming model, unlike the CPU’s serial model; many CPU algorithms cannot be directly mapped to GPU and must be redesigned for parallel execution.
Memory (System RAM vs Video Memory)
CPU‑memory bandwidth is typically tens of GB/s (e.g., Intel Xeon E5‑2699 v3 at 68 GB/s). GPU memory bandwidth can reach hundreds of GB/s (e.g., P40 at 346 GB/s), though memory remains a performance bottleneck for GPU workloads.
Basic GPU 3D Rendering Pipeline
Vertex Processing : Transform 3D vertex coordinates to 2D screen space using linear algebra; performed in parallel by the vertex shader.
Primitive Assembly : Reconstruct mesh from vertices, perform clipping, and generate triangles.
Rasterization : Convert vector primitives into pixel fragments.
Fragment Shader / Texture Mapping : Compute color and texture for each fragment using the texture mapping unit.
Pixel Operations : Depth testing, texture sampling, alpha blending, etc.
Final Output : Raster Operations (ROP) write the final pixel colors to the frame buffer for display.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
