Fundamentals 10 min read

From 2D Cards to AI Powerhouses: The Evolution of GPUs

This article traces the GPU's journey from early 2D graphics cards to modern GPGPUs powering AI and HPC, explains core hardware components, compares GPU and CPU architectures, and details the 3D rendering pipeline that underlies graphics and parallel computation.

AI Cyberspace
AI Cyberspace
AI Cyberspace
From 2D Cards to AI Powerhouses: The Evolution of GPUs

GPU Development History

1981: earliest graphics cards – IBM PC MDA and CGA 2D acceleration cards.

1988: first‑generation VGA Card – display‑only, graphics still processed by the CPU.

1991: second‑generation graphics card – dedicated chip for Windows GUI, offloading graphics from the CPU.

1994: third‑generation video card – added video codec acceleration.

1996: fourth‑generation 3D accelerator – NVIDIA TNT and TNT2 series.

1999: fifth‑generation GPU – NVIDIA GeForce 256 introduced a full 3D engine (transform, lighting, setup, rendering).

Present: sixth‑generation GPGPU – used widely in AI, HPC and beyond.

Note that a modern graphics card consists of a GPU, video memory, PCIe bus, PCB, RAM DAC, BIOS, various connectors and a cooling system.

GPU components diagram
GPU components diagram

Basic GPU Classification

NVIDIA introduced the GPU concept with the GeForce 256, moving PCs from integrated graphics to dedicated acceleration.

NVIDIA dominates the discrete GPU market with GeForce, GTX and RTX series.

Fundamental Differences Between GPU and CPU

Design Purpose (General vs Specialized Computing)

CPU : Designed for general‑purpose computing, low latency, complex control unit, few cores, high clock speed, strong ALU, large multi‑level caches (L1‑L3), features such as branch prediction and out‑of‑order execution.

GPU : Designed for specialized parallel computing, many cores, high throughput, many ALUs, simple control logic, small cache aimed at serving threads.

In analogy, the CPU is a versatile leader, while the GPU is a worker with massive compute power under the CPU’s scheduling.

GPU excels in workloads with massive parallelism and frequent memory access, delivering 10‑100× speedups over CPUs for graphics rendering, deep learning and other parallel tasks.

CPU vs GPU architecture
CPU vs GPU architecture

Computing Model (Serial vs Parallel)

GPU follows a parallel programming model, unlike the CPU’s serial model; many CPU algorithms cannot be directly mapped to GPU and must be redesigned for parallel execution.

Memory (System RAM vs Video Memory)

CPU‑memory bandwidth is typically tens of GB/s (e.g., Intel Xeon E5‑2699 v3 at 68 GB/s). GPU memory bandwidth can reach hundreds of GB/s (e.g., P40 at 346 GB/s), though memory remains a performance bottleneck for GPU workloads.

Memory bandwidth comparison
Memory bandwidth comparison

Basic GPU 3D Rendering Pipeline

Vertex Processing : Transform 3D vertex coordinates to 2D screen space using linear algebra; performed in parallel by the vertex shader.

Vertex processing
Vertex processing

Primitive Assembly : Reconstruct mesh from vertices, perform clipping, and generate triangles.

Primitive assembly
Primitive assembly

Rasterization : Convert vector primitives into pixel fragments.

Rasterization
Rasterization

Fragment Shader / Texture Mapping : Compute color and texture for each fragment using the texture mapping unit.

Fragment shading
Fragment shading

Pixel Operations : Depth testing, texture sampling, alpha blending, etc.

Pixel operations
Pixel operations

Final Output : Raster Operations (ROP) write the final pixel colors to the frame buffer for display.

Final output
Final output
Parallel ComputingGPUcomputer architectureRendering PipelineGraphics Processing Unit
AI Cyberspace
Written by

AI Cyberspace

AI, big data, cloud computing, and networking.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.